IntelMQ¶

IntelMQ is a solution for IT security teams (CERTs & CSIRTs, SOCs abuse departments, etc.) for collecting and processing security feeds (such as log files) using a message queuing protocol. It’s a community driven initiative called IHAP (Incident Handling Automation Project) which was conceptually designed by European CERTs/CSIRTs during several InfoSec events. Its main goal is to give to incident responders an easy way to collect & process threat intelligence thus improving the incident handling processes of CERTs.
General information¶
Introduction¶
About¶
IntelMQ is a solution for IT security teams (CERTs & CSIRTs, SOCs abuse departments, etc.) for collecting and processing security feeds (such as log files) using a message queuing protocol. It’s a community driven initiative called IHAP (Incident Handling Automation Project) which was conceptually designed by European CERTs/CSIRTs during several InfoSec events. Its main goal is to give to incident responders an easy way to collect & process threat intelligence thus improving the incident handling processes of CERTs.
Incident Handling Automation Project
URL: <http://www.enisa.europa.eu/activities/cert/support/incident-handling-automation>
Mailing-list: <ihap@lists.trusted-introducer.org>
Several pieces of software are evolved around IntelMQ. For an overview, look at the IntelMQ Universe.
IntelMQ can be used for - automated incident handling - situational awareness - automated notifications - as data collector for other tools - etc.
IntelMQ’s design was influenced by AbuseHelper however it was re-written from scratch and aims at:
Reducing the complexity of system administration
Reducing the complexity of writing new bots for new data feeds
Reducing the probability of events lost in all process with persistence functionality (even system crash)
Use and improve the existing Data Harmonization Ontology
Use JSON format for all messages
Provide easy way to store data into Log Collectors like ElasticSearch, Splunk, databases (such as PostgreSQL)
Provide easy way to create your own black-lists
Provide easy communication with other systems via HTTP RESTful API
It follows the following basic meta-guidelines:
Don’t break simplicity - KISS
Keep it open source - forever
Strive for perfection while keeping a deadline
Reduce complexity/avoid feature bloat
Embrace unit testing
Code readability: test with inexperienced programmers
Communicate clearly
Usage¶
Various approaches of installing intelmq are described in Installation.
The Configuration and Management gives an overview how a intelmq installation is set up and how to configure and maintain the setup. There is also a list of available Data Feeds as well as a detailed description of the different Bots inventory intelmq brings with it.
If you know additional feeds and how to parse them, please contribute your code or your configuration (by issues or the mailing lists).
If you need help, read here about your options: Getting support.
IntelMQ Manager¶
Check out this graphical tool to easily manage an IntelMQ system.
Contribute¶
Subscribe to the IntelMQ Developers Mailinglist
Via GitHub issues
Via Pull requests (please have a look at the Developers Guide first)
IntelMQ Organizational Structure¶
Contents
The central IntelMQ components are maintained by multiple people and organizations in the IntelMQ community. Please note that some components of the IntelMQ Universe can have a different project governance, but all are part of the IntelMQ universe and community.
IntelMQ Enhancement Proposals (IEP)¶
Major changes, including architecture, strategy and the internal data format, require so-called IEPs, IntelMQ Enhancement Proposals. Their name is based on the famous “PEPs” of Python.
IEPs are collected in the separate iep repository.
Code-Reviews and Merging¶
Every line of code checked in for the IntelMQ Core, is checked by at least one trusted developer (excluding the author of the changes) of the IntelMQ community. Afterwards, the code can be merged. Currently, these three contributors, have the permission to push and merging code to IntelMQ Core, Manager and API:
Aaron Kaplan (aaronkaplan)
Sebastian Wagner (sebix)
Sebastian Waldbauer (waldbauer-certat)
- Additionally, these people significantly contributed to IntelMQ:
Bernhard Reiter
Birger Schacht
Edvard Rejthar
Filip Pokorný
Karl-Johan Karlsson
Marius Karotkis
Marius Urkus
Mikk Margus Möll
navtej
Pavel Kácha
Robert Šefr
Tomas Bellus
Zach Stone
Short history¶
The idea and overall concept of an free, simple and extendible software for automated incident handling was born at an meeting of several European CSIRTs in Heraklion, Greece, in 2014. Following the event, Tomás Lima “SYNchroACK” (working at CERT.pt back then) created IntelMQ from scratch. IntelMQ was born on June 24th, 2014. A major support came from CERT.pt at this early stage. Aaron Kaplan (CERT.at until 2020) engaged in the long-term advancement and from 2015 on, CERT.at took the burden of the maintenance and development (Sebastian Wagner 2015-2021 at CERT.at). From 2016 onward, CERT.at started projects, initiated and lead by Aaron Kaplan, receiving CEFF-funding from the European Union to support IntelMQ’s development. IntelMQ became a software component of the EU-funded MeliCERTes framework for CSIRTs.
In 2020, IntelMQ’s organizational structure and architectural development gained new thrive by the newly founded Board and the start of the IEP process, creating more structure and more transparency in the IntelMQ community’s decisions.
Getting support¶
In case you are lost, you need assistance or something is not discussed in this guide, you can ask the community for help.
Contents
General tips¶
- To be most efficient in seeking help, please describe your problem or question with all necessary information, for example:
Name and version of the operating system
Way of installation (deb/rpm packages, PyPI, local git repository)
Used bots and configuration
Logs of bots or terminal output
Any other useful messages, screenshots
Mailing list¶
The most traditional way is to ask your question, make a proposal or discuss a topic on the IntelMQ Users Mailinglist. You need to subscribe to the mailing list before posting, but the archive is publicly available: IntelMQ-Users Archive.
GitHub¶
To report bugs, GitHub issues are the ideal place to do so. Every IntelMQ component has it’s own repository on GitHub, with a separate Issue tracker.
GitHub also offers a discussion platform.
To participate on GitHub, you first need to create an account on the platform.
Assistance¶
If your organisation is a member of the CSIRTs Network, you are eligible for support in the MeliCERTes project. You can also ask on IntelMQ Users Mailinglist for individual support, some members offer support, including, but not limited to:
Aaron Kaplan (founder of IntelMQ)
Institute for Common Good Technology (chairmen Sebastian Wager is a IntelMQ maintainer and developer)
Intevation GmbH (Develops and maintains several IntelMQ components)
Development¶
Mailing list¶
There is a separate mailing list for developers to discuss development topics: IntelMQ Developers Mailinglist The IntelMQ-Dev Archive is public as well.
Please also read the Developers Guide.
GitHub¶
The ideal way to propose changes and additions to IntelMQ is to open a Pull Request on GitHub.
User guide¶
Hardware Requirements¶
Do you ask yourself how much RAM do you need to give your new IntelMQ virtual machine?
The honest answer is simple and pointless: It depends ;)
Contents
IntelMQ and the messaging queue (broker)¶
IntelMQ uses a messaging queue to move the messages between the bots. All bot instances can only process one message at a time, therefore all other messages need to wait in the queue. As not all bots are equally fast, the messages will naturally “queue up” before the slower ones. Further, parsers produce many events with just one message (the report) as input.
The following estimations assume Redis as messaging broker which is the default for IntelMQ. When RabbitMQ is used, the required resources will differ, and RabbitMQ can handle system overload and therefore a shortage of memory.
As Redis stores all data in memory, the data which is processed at any point in time must fit there, including overheads. Please note that IntelMQ does neither store nor cache any input data. These estimates therefore only relate to the processing step, not the storage.
For a minimal system, these requirements suffice:
4 GB of RAM
2 CPUs
10 GB disk size
Depending on your data input, you will need the twentiethfold of the input data size as memory for processing.
When using Redis persistence, you will additionally need twice as much memory for Redis.
Disk space¶
Disk space is only relevant if you save your data to a file, which is not recommended for production setups, and only useful for testing and evaluation.
Do not forget to rotate your logs or use syslog, especially if you use the logging level “DEBUG”. logrotate is in use by default for all installation with deb/rpm packages. When other means of installation are used (pip, manual), configure log rotation manually. See Logging.
Background on memory¶
For experimentation, we used multiple Shadowserver Poodle reports for demonstration purpose, totaling in 120 MB of data. All numbers are estimates and are rounded. In memory, the report data requires 160 MB. After parsing, the memory usage increases to 850 MB in total, as every data line is stored as JSON, with additional information plus the original data encoded in Base 64. The further processing steps depend on the configuration, but you can estimate that caches (for lookups and deduplication) and other added information cause an additional size increase of about 2x. Once a dataset finished processing in IntelMQ, it is no longer stored in memory. Therefore, the memory is only needed to catch high load.
The above numbers result in a factor of 14 for input data size vs. memory required by Redis. Assuming some overhead and memory for the bots’ processes, a factor of 20 seems sensible.
To reduce the amount of required memory and disk size, you can optionally remove the raw data field, see Removing raw data for higher performance and less space usage in the FAQ.
Additional components¶
If some of the optional components of the ecosystem are in use, they can add additional hardware requirements.
Those components do not add relevant requirements:
IntelMQ API: It is just an API for intelmqctl.
IntelMQ Manager: Only contains static files served by the webserver.
IntelMQ Webinput CSV: Just a webinterface to insert data. Requires the amount of processed data to fit in memory, see above.
Stats Portal: The aggregation step and Graphana require some resources, but no exact numbers are known.
Malware Name Mapping
Docker: The docker layer adds only minimal hardware requirements.
EventDB¶
When storing data in databases (such as MongoDB, PostgreSQL, ElasticSearch), it is recommended to do this on separate machines for operational reasons. Using a different machine results in a separation of stream processing to data storage and allows for a specialized system optimization for both use-cases.
IntelMQ cb mailgen¶
While the Fody backend and frontend do not have significant requirements, the RIPE import tool of the certbund-contact requires about 8 GB of memory as of March 2021.
Installation¶
Contents
Please report any errors an suggest improvements at IntelMQ Issues. Thanks!
For upgrade instructions, see Upgrade instructions. For testing pre-releases see also Testing Pre-releases.
Following any one of the installation methods will setup the IntelMQ base. Some bots may have additional special dependencies which are mentioned in their own documentation.
The following installation methods are available:
native .deb/.rpm packages
Docker, with and without docker-compose
Python package from PyPI
From the git-repository, see Development Environment
Base Requirements¶
The following instructions assume the following requirements. Python versions >= 3.7 are supported.
Supported and recommended operating systems are:
Debian 10 Buster and 11 Bullseye
openSUSE Tumbleweed
Ubuntu: 20.04 focal
For the Docker-installation: Docker Engine: 18.x and higher
Other distributions which are (most probably) supported include AlmaLinux, CentOS, Fedora, FreeBSD 12, RHEL and RockyLinux.
A short guide on hardware requirements can be found on the page Hardware Requirements.
Native deb/rpm packages¶
These are the operating systems which are currently supported by packages:
Debian 10 Buster
Debian 11 Bullseye
openSUSE Tumbleweed
Ubuntu 20.04 Focal Fossa (enable the universe repositories by appending
universe
in/etc/apt/sources.list
todeb http://[...].archive.ubuntu.com/ubuntu/ focal main
first)
Get the installation instructions for your operating system here: Installation Native Packages. The instructions show how to add the repository and install the intelmq package. You can also install the intelmq-manager package to get the Web-Frontend IntelMQ Manager.
Docker¶
Attention: Currently you can’t manage your botnet via intelmqctl documentation. You need to use IntelMQ-Manager currently!
The latest IntelMQ image is hosted on Docker Hub and the image build instructions are in our intelmq-docker repository <https://github.com/certat/intelmq-docker>.
Follow Docker Install and Docker-Compose Install instructions.
Before you start using docker-compose or any docker related tools, make sure docker is running:
# To start the docker daemon
systemctl start docker.service
# To enable the docker daemon for the future
systemctl enable docker.service
Now we can download IntelMQ and start the containers. Navigate to your preferred installation directory and run the following commands:
git clone https://github.com/certat/intelmq-docker.git --recursive
cd intelmq-docker
sudo docker-compose pull
sudo docker-compose up
Your installation should be successful now. You’re now able to visit http://127.0.0.1:1337/
to access the intelmq-manager.
You have to login with the username intelmq
and the password intelmq
, if you want to change the username or password,
you can do this by adding the environment variables INTELMQ_API_USER
for the username and INTELMQ_API_PASS
for the
password.
NOTE: If you get an Permission denied, you should use chown -R $USER:$USER example_config
.
With pip from PyPI¶
Requirements¶
Ubuntu / Debian
apt install python3-pip python3-dnspython python3-psutil python3-redis python3-requests python3-termstyle python3-tz python3-dateutil redis-server bash-completion jq
# optional dependencies
apt install python3-pymongo python3-psycopg2
CentOS 7 / RHEL 7:
yum install epel-release
yum install python36 python36-dns python36-requests python3-setuptools redis bash-completion jq
yum install gcc gcc-c++ python36-devel
# optional dependencies
yum install python3-psycopg2
Note
We no longer support already end-of-life Python 3.6, which is the last Python version officially packaged for CentOS Linux 7. You can either use alternative Python source, or stay on the IntelMQ 3.0.2.
CentOS 8:
dnf install epel-release
dnf install python3-dateutil python3-dns python3-pip python3-psutil python3-redis python3-requests redis bash-completion jq
# optional dependencies
dnf install python3-psycopg2 python3-pymongo
openSUSE:
zypper install python3-dateutil python3-dnspython python3-psutil python3-redis python3-requests python3-python-termstyle redis bash-completion jq
# optional dependencies
zypper in python3-psycopg2 python3-pymongo
Installation¶
The base directory is /opt/intelmq/
, if the environment variable INTELMQ_ROOT_DIR
is not set to something else, see /opt and LSB paths for more information.
sudo -i
pip3 install intelmq
useradd -d /opt/intelmq -U -s /bin/bash intelmq
sudo intelmqsetup
intelmqsetup will create all necessary directories, provides a default configuration for new setups. See the Configuration for more information on them and how to influence them.
Docker without docker-compose¶
If not already installed, please install Docker.
Navigate to your preferred installation directory and run git clone https://github.com/certat/intelmq-docker.git --recursive
.
You need to prepare some volumes & configs. Edit the left-side after -v, to change paths.
Change redis_host
to a running redis-instance. Docker will resolve it automatically.
All containers are connected using Docker Networks.
In order to work with your current infrastructure, you need to specify some environment variables
sudo docker pull redis:latest
sudo docker pull certat/intelmq-full:latest
sudo docker pull certat/intelmq-nginx:latest
sudo docker network create intelmq-internal
sudo docker run -v ~/intelmq/example_config/redis/redis.conf:/redis.conf \
--network intelmq-internal \
--name redis \
redis:latest
sudo docker run --network intelmq-internal \
--name nginx \
certat/intelmq-nginx:latest
sudo docker run -e INTELMQ_IS_DOCKER="true" \
-e INTELMQ_SOURCE_PIPELINE_BROKER: "redis" \
-e INTELMQ_PIPELINE_BROKER: "redis" \
-e INTELMQ_DESTIONATION_PIPELINE_BROKER: "redis" \
-e INTELMQ_PIPELINE_HOST: redis \
-e INTELMQ_SOURCE_PIPELINE_HOST: redis \
-e INTELMQ_DESTINATION_PIPELINE_HOST: redis \
-e INTELMQ_REDIS_CACHE_HOST: redis \
-v $(pwd)/example_config/intelmq/etc/:/etc/intelmq/etc/ \
-v $(pwd)/example_config/intelmq-api/config.json:/etc/intelmq/api-config.json \
-v $(pwd)/intelmq_logs:/etc/intelmq/var/log \
-v $(pwd)/intelmq_output:/etc/intelmq/var/lib/bots \
-v ~/intelmq/lib:/etc/intelmq/var/lib \
--network intelmq-internal \
--name intelmq \
certat/intelmq-full:latest
If you want to use another username and password for the intelmq-manager / api login, additionally add two new environment variables.
-e INTELMQ_API_USER: "your username"
-e INTELMQ_API_PASS: "your password"
Upgrade instructions¶
Contents
For installation instructions, see Installation.
Read NEWS.md¶
Read the NEWS.md file to look for things you need to have a look at.
Stop IntelMQ and create a Backup¶
Make sure that your IntelMQ system is completely stopped: intelmqctl stop
Create a backup of IntelMQ Home directory, which includes all configurations. They are not overwritten, but backups are always nice to have!
sudo cp -R /opt/intelmq /opt/intelmq-backup
Upgrade IntelMQ¶
Before upgrading, check that your setup is clean and there are no events in the queues:
intelmqctl check
intelmqctl list queues -q
The upgrade depends on how you installed IntelMQ.
Packages¶
Use your systems package management.
Docker (beta)¶
You can check out all current versions on our DockerHub.
docker pull certat/intelmq-full:latest
docker pull certat/intelmq-nginx:latest
Alternatively you can use docker-compose:
docker-compose pull
You can check the current versions from intelmq & intelmq-manager & intelmq-api via git commit ref.
The Version format for each included item is key=value and they are saparated via ,. I. e. IntelMQ=ab12cd34f, IntelMQ-API=xy65z23.
docker inspect --format '{{ index .Config.Labels "org.opencontainers.image.version" }}' intelmq-full:latest
Now restart your container, if you’re using docker-compose you simply write:
docker-compose down
If you dont use docker-compose, you can restart a single container using:
docker ps | grep certat
docker stop CONTAINER_ID
PyPi¶
pip install -U --no-deps intelmq
sudo intelmqsetup
Using –no-deps will not upgrade dependencies, which would probably overwrite the system’s libraries. Remove this option to also upgrade dependencies.
Local repository¶
If you have an editable installation, refer to the instructions in the Developers Guide.
Update the repository depending on your setup (e.g. git pull origin master).
And run the installation again:
pip install .
sudo intelmqsetup
For editable installations (development only), run pip install -e . instead.
Upgrade configuration and check the installation¶
Go through NEWS.md and apply necessary adaptions to your setup. If you have adapted IntelMQ’s code, also read the CHANGELOG.md.
Check your installation and configuration to detect any problems:
intelmqctl upgrade-config
intelmqctl check
intelmqctl upgrade-config
supports upgrades from one IntelMQ version to the succeeding.
If you skip one or more IntelMQ versions, some automatic upgrades may not work and manual intervention may be necessary.
Start IntelMQ¶
intelmqctl start
Configuration and Management¶
Contents
For installation instructions, see Installation. For upgrade instructions, see Upgrade instructions.
Configure services¶
You need to enable and start Redis if not already done. Using systemd it can be done with:
systemctl enable redis.service
systemctl start redis.service
Configuration¶
/opt and LSB paths¶
If you installed the packages, standard Linux paths (LSB paths) are used: /var/log/intelmq/
, /etc/intelmq/
, /var/lib/intelmq/
, /var/run/intelmq/
.
Otherwise, the configuration directory is /opt/intelmq/etc/
. Using the environment variable INTELMQ_ROOT_DIR
allows setting any arbitrary root directory.
You can switch this by setting the environment variables INTELMQ_PATHS_NO_OPT
and INTELMQ_PATHS_OPT
, respectively.
* When installing the Python packages, you can set INTELMQ_PATHS_NO_OPT
to something non-empty to use LSB-paths.
* When installing the deb/rpm packages, you can set INTELMQ_PATHS_OPT
to something non-empty to use /opt/intelmq/
paths, or a path set with INTELMQ_ROOT_DIR
.
The environment variable ROOT_DIR
is meant to set an alternative root directory instead of /. This is primarily meant for package build environments an analogous to setuptools’ --root
parameter. Thus it is only used in LSB-mode.
Overview¶
The main configuration file is formatted in the YAML format since IntelMQ 3.0 (before it was JSON, which had some downsides). Although, comments in YAML are currently not preserved by IntelMQ (known bug #2003). For new installations a default setup with some examples is provided by the intelmqsetup tool. If this is not the case, make sure the program was run (see Installation instructions).
runtime.yaml
: Configuration for the individual bots. See Bots inventory for more details.harmonization.conf
: Configuration of the internal data format, see Data Format and Harmonization field names.
To configure a new bot, you need to define and configure it in runtime.yaml
.
You can base your configuration on the output of intelmqctl list bots
and the Data Feeds documentation page.
Use the IntelMQ Manager mentioned above to generate the configuration files if unsure.
In the shipped examples 4 collectors and parsers, 6 common experts and one output are configured. The default collector and the parser handle data from malware domain list, the file output bot writes all data to /opt/intelmq/var/lib/bots/file-output/events.txt
//var/lib/intelmq/bots/file-output/events.txt
.
Systemwide Configuration (global)¶
All bots inherit the global configuration parameters in the runtime.yaml
and they can overwrite them using the same parameters in their individual configuration in the runtime.yaml
file.
Logging¶
The logging can be configured with the following parameters:
logging_handler
: Can be one of"file"
or"syslog"
.logging_level
: Defines the system-wide log level that will be use by all bots and the intelmqctl tool. Possible values are:"CRITICAL"
,"ERROR"
,"WARNING"
,"INFO"
and"DEBUG"
.logging_path
: Iflogging_handler
isfile
. Defines the system-wide log-folder that will be use by all bots and the intelmqctl tool. Default value:/opt/intelmq/var/log/
or/var/log/intelmq/
respectively.logging_syslog
: Iflogging_handler
issyslog
. Either a list with hostname and UDP port of syslog service, e.g.["localhost", 514]
or a device name/path, e.g. the default"/var/log"
.
We recommend logging_level
WARNING
for production environments and INFO
if you want more details. In any case, watch your free disk space!
Log rotation¶
To rotate the logs, you can use the standard Linux-tool logrotate.
An example logrotate configuration is given in contrib/logrotate/
and delivered with all deb/rpm-packages.
When not using logrotate, IntelMQ can rotate the logs itself, which is not enabled by default! You need to set both values.
logging_max_size
: Maximum number of bytes to be stored in one logfile before the file is rotated (default: 0, equivalent to unset).logging_max_copies
: Maximum number of logfiles to keep (default: unset). Compression is not supported.
Some information can as well be found in Python’s documentation on the used RotatingFileHandler.
Error Handling¶
- error_log_message - in case of an error, this option will allow the bot to write the message (report or event) to the log file. Use the following values:
true/false - write or not write message to the log file
- error_log_exception - in case of an error, this option will allow the bot to write the error exception to the log file. Use the following values:
true/false - write or not write exception to the log file
error_procedure - in case of an error, this option defines the procedure that the bot will adopt. Use the following values:
stop - stop bot after retrying X times (as defined in
error_max_retries
) with a delay between retries (as defined inerror_retry_delay
). If the bot reaches theerror_max_retries
value, it will remove the message from the pipeline and stop. If the optionerror_dump_message
is also enable, the bot will dump the removed message to its dump file (to be found in var/log).pass - will skip this message and will process the next message after retrying X times, removing the current message from pipeline. If the option
error_dump_message
is also enable, then the bot will dump the removed message to its dump file. After max retries are reached, the rate limit is applied (e.g. a collector bot fetch an unavailable resource does not try forever).
error_max_retries - in case of an error, the bot will try to re-start processing the current message X times as defined by this option. int value.
error_retry_delay - defines the number of seconds to wait between subsequent re-tries in case of an error. int value.
- error_dump_message - specifies if the bot will write queued up messages to its dump file (use intelmqdump to re-insert the message).
true/false - write or not write message to the dump file
If the path _on_error
exists for a bot, the message is also sent to this queue, instead of (only) dumping the file if configured to do so.
Miscellaneous¶
- load_balance - this option allows you to choose the behavior of the queue. Use the following values:
true - splits the messages into several queues without duplication
false - duplicates the messages into each queue
When using AMQP as message broker, take a look at the Multithreading (Beta) section and the
instances_threads
parameter.
rate_limit - time interval (in seconds) between messages processing. int value.
ssl_ca_certificate - trusted CA certificate for IMAP connections (supported by some bots).
- source_pipeline_broker & destination_pipeline_broker - select which broker IntelMQ should use. There are two options
redis (default) - Please note that persistence has to be manually activated.
amqp - The AMQP pipeline is currently beta but there are no known issues. A popular AMQP broker is RabbitMQ. See AMQP (Beta) for more details.
As these parameters can be set per bot, this allows usage of different broker systems and hosts, as well as switching between them on the same IntelMQ instance.
source_pipeline_host - broker IP, FQDN or Unix socket that the bot will use to connect and receive messages.
source_pipeline_port - broker port that the bot will use to connect and receive messages. Can be empty for Unix socket.
source_pipeline_password - broker password that the bot will use to connect and receive messages. Can be null for unprotected broker.
source_pipeline_db - broker database that the bot will use to connect and receive messages (requirement from redis broker).
destination_pipeline_host - broker IP, FQDN or Unix socket that the bot will use to connect and send messages.
destination_pipeline_port - broker port that the bot will use to connect and send messages. Can be empty for Unix socket.
destination_pipeline_password - broker password that the bot will use to connect and send messages. Can be null for unprotected broker.
destination_pipeline_db - broker database that the bot will use to connect and send messages (requirement from redis broker).
http_proxy - HTTP proxy the that bot will use when performing HTTP requests (e.g. bots/collectors/collector_http.py). The value must follow RFC 1738.
https_proxy - HTTPS proxy that the bot will use when performing secure HTTPS requests (e.g. bots/collectors/collector_http.py).
http_user_agent - user-agent string that the bot will use when performing HTTP/HTTPS requests (e.g. bots/collectors/collector_http.py).
- http_verify_cert - defines if the bot will verify SSL certificates when performing HTTPS requests (e.g. bots/collectors/collector_http.py).
true/false - verify or not verify SSL certificates
Using supervisor as process manager (Beta)¶
First of all: Do not use it in production environments yet! It has not been tested thoroughly yet.
Supervisor is process manager written in Python. The main advantage is that it take care about processes, so if bot process exit with failure (exit code different than 0), supervisor try to run it again. Another advantage is that it not require writing PID files.
This was tested on Ubuntu 18.04.
Install supervisor. supervisor_twiddler
is extension for supervisor, that makes possible to create process dynamically. (Ubuntu supervisor
package is currently based on Python 2, so supervisor_twiddler
must be installed with Python 2 pip
.)
apt install supervisor python-pip
pip install supervisor_twiddler
Create default config /etc/supervisor/conf.d/intelmq.conf
and restart supervisor
service:
[rpcinterface:twiddler]
supervisor.rpcinterface_factory=supervisor_twiddler.rpcinterface:make_twiddler_rpcinterface
[group:intelmq]
Change IntelMQ process manager in the global configuration:
process_manager: supervisor
After this it is possible to manage bots like before with intelmqctl
command.
Runtime Configuration¶
This configuration is used by each bot to load its specific (runtime) parameters. The IntelMQ Manager can generate this configuration for you. You may edit it manually as well. Be sure to re-load the bot (see the intelmqctl documentation).
Template:
<bot ID>:
group: <bot type (Collector, Parser, Expert, Output)>
name: <human-readable bot name>
module: <bot code (python module)>
description: <generic description of the bot>
parameters:
<parameter 1>: <value 1>
<parameter 2>: <value 2>
<parameter 3>: <value 3>
Example:
blocklistde-apache-collector:
group: Collector
name: Blocklist.de Apache List
module: intelmq.bots.collectors.http.collector_http
description: Blocklist.de Apache Collector fetches all IP addresses which have been reported within the last 48 hours as having run attacks on the service Apache, Apache-DDOS, RFI-Attacks.
parameters:
http_url: https://lists.blocklist.de/lists/apache.txt
name: Blocklist.de Apache
rate_limit: 3600
More examples can be found in the intelmq/etc/runtime.yaml
file. See Bots inventory for more details.
By default, all of the bots are started when you start the whole botnet, however there is a possibility to disable a bot. This means that the bot will not start every time you start the botnet, but you can start and stop the bot if you specify the bot explicitly. To disable a bot, add the following to your runtime.yaml
: "enabled": false
. For example:
blocklistde-apache-collector:
group: Collector
name: Blocklist.de Apache List
module: intelmq.bots.collectors.http.collector_http
description: Blocklist.de Apache Collector fetches all IP addresses which have been reported within the last 48 hours as having run attacks on the service Apache, Apache-DDOS, RFI-Attacks.
enabled: false
parameters:
http_url: https://lists.blocklist.de/lists/apache.txt
name: Blocklist.de Apache
rate_limit: 3600
Pipeline Configuration¶
The pipeline configuration defines how the data is exchanges between the bots. For each bot, it defines the source queue (there is always only one) and one or multiple destination queues. This section shows the possibilities and definition as well as examples. The configuration of the pipeline can be done by the IntelMQ Manager with no need to intervene manually. It is recommended to use this tool as it guarantees that the configuration is correct. The configuration of the pipelines is done in the runtime.yaml
as part of the individual bots settings.
Source queue¶
This setting is optional, by default, the source queue is the bot ID plus “-queue” appended.
For example, if the bot ID is example-bot
, the source queue name is example-bot-queue
.
source-queue: example-bot-queue
For collectors, this field does not exist, as the fetch the data from outside the IntelMQ system by definition.
Destination queues¶
Destination queues are defined using a dictionary with a name as key and a list of queue-identifiers as the value.
destination-queues:
_default:
- <first destination pipeline name>
- <second destination pipeline name>
_on_error:
- <optional first destination pipeline name in case of errors>
- <optional second destination pipeline name in case of errors>
other-path:
- <second destination pipeline name>
- <third destination pipeline name>
In this case, bot will be able to send the message to one of defined paths. The path "_default"
is used if none is specified by the bot itself.
In case of errors during processing, and the optional path "_on_error"
is specified, the message will be sent to the pipelines given given as on-error.
Other destination queues can be explicitly addressed by the bots, e.g. bots with filtering capabilities. Some expert bots are capable of sending messages to paths, this feature is explained in their documentation, e.g. the Filter expert and the Sieve expert.
The named queues need to be explicitly addressed by the bot (e.g. filtering) or the core (_on_error
) to be used. Setting arbitrary paths has no effect.
AMQP (Beta)¶
Starting with IntelMQ 1.2 the AMQP protocol is supported as message queue. To use it, install a broker, for example RabbitMQ. The configuration and the differences are outlined here. Keep in mind that it is slower, but has better monitoring capabilities and is more stable. The AMQP support is considered beta, so small problems might occur. So far, only RabbitMQ as broker has been tested.
You can change the broker for single bots (set the parameters in the runtime configuration per bot) or for the whole botnet (using the global configuration).
You need to set the parameter source_pipeline_broker
/destination_pipeline_broker
to amqp
. There are more parameters available:
destination_pipeline_broker
:"amqp"
destination_pipeline_host
(default:'127.0.0.1'
)destination_pipeline_port
(default: 5672)destination_pipeline_username
destination_pipeline_password
destination_pipeline_socket_timeout
(default: no timeout)destination_pipeline_amqp_exchange
: Only change/set this if you know what you do. If set, the destination queues are not declared as queues, but used as routing key. (default:''
).destination_pipeline_amqp_virtual_host
(default:'/'
)source_pipeline_host
(default:'127.0.0.1'
)source_pipeline_port
(default: 5672)source_pipeline_username
source_pipeline_password
source_pipeline_socket_timeout
(default: no timeout)source_pipeline_amqp_exchange
: Only change/set this if you know what you do. If set, the destination queues are not declared as queues, but used as routing key. (default: ‘’).source_pipeline_amqp_virtual_host
(default:'/'
)intelmqctl_rabbitmq_monitoring_url
string, see below (default:"http://{host}:15672"
)
For getting the queue sizes, intelmqctl
needs to connect to the monitoring interface of RabbitMQ. If the monitoring interface is not available under http://{host}:15672
you can manually set using the parameter intelmqctl_rabbitmq_monitoring_url
.
In a RabbitMQ’s default configuration you might not provide a user account, as by default the administrator (guest
:guest
) allows full access from localhost. If you create a separate user account, make sure to add the tag “monitoring” to it, otherwise IntelMQ can’t fetch the queue sizes.

Setting the statistics (and cache) parameters is necessary when the local redis is running under a non-default host/port. If this is the case, you can set them explicitly:
statistics_database
:3
statistics_host
:"127.0.0.1"
statistics_password
:null
statistics_port
:6379
Multithreading (Beta)¶
First of all: Do not use it in production environments yet! There are a few bugs, see below
Since IntelMQ 2.0 it is possible to provide the following parameter:
instances_threads
Set it to a non-zero integer, then this number of worker threads will be spawn. This is useful if bots often wait for system resources or if network-based lookups are a bottleneck.
However, there are currently a few cavecats:
This is not possible for all bots, there are some exceptions (collectors and some outputs), see the Frequently asked questions for some reasons.
Only use it with the AMQP pipeline, as with Redis, messages may get duplicated because there’s only one internal queue
In the logs, you can see the main thread initializing first, then all of the threads which log with the name
[bot-id].[thread-id]
.
Harmonization Configuration¶
This configuration is used to specify the fields for all message types. The harmonization library will load this configuration to check, during the message processing, if the values are compliant to the “harmonization” format. Usually, this configuration doesn’t need any change. It is mostly maintained by the intelmq maintainers.
Template:
{
"<message type>": {
"<field 1>": {
"description": "<field 1 description>",
"type": "<field value type>"
},
"<field 2>": {
"description": "<field 2 description>",
"type": "<field value type>"
}
},
}
Example:
{
"event": {
"destination.asn": {
"description": "The autonomous system number from which originated the connection.",
"type": "Integer"
},
"destination.geolocation.cc": {
"description": "Country-Code according to ISO3166-1 alpha-2 for the destination IP.",
"regex": "^[a-zA-Z0-9]{2}$",
"type": "String"
},
},
}
More examples can be found in the intelmq/etc/harmonization.conf
directory.
Utilities¶
Management¶
IntelMQ has a modular structure consisting of bots. There are four types of bots:
Collector Bots retrieve data from internal or external sources, the output are reports consisting of many individual data sets / log lines.
Parser Bots parse the (report) data by splitting it into individual events (log lines) and giving them a defined structure, see also Data Format for the list of fields an event may be split up into.
Expert Bots enrich the existing events by e.g. lookup up information such as DNS reverse records, geographic location information (country code) or abuse contacts for an IP address or domain name.
Output Bots write events to files, databases, (REST)-APIs or any other data sink that you might want to write to.
Each bot has one source queue (except collectors) and can have multiple destination queues (except outputs). But multiple bots can write to the same pipeline (queue), resulting in multiple inputs for the next bot.
Every bot runs in a separate process. A bot is identifiable by a bot id.
Currently only one instance (i.e. with the same bot id) of a bot can run at the same time. Concepts for multiprocessing are being discussed, see this issue: Multiprocessing per queue is not supported #186. Currently you can run multiple processes of the same bot (with different bot ids) in parallel.
Example: multiple gethostbyname bots (with different bot ids) may run in parallel, with the same input queue and sending to the same output queue. Note that the bot providing the input queue must have the load_balance
option set to true
.
Web interface: IntelMQ Manager¶
IntelMQ has a tool called IntelMQ Manager that gives users an easy way to configure all pipelines with bots that your team needs. For beginners, it’s recommended to use the IntelMQ Manager to become acquainted with the functionalities and concepts. The IntelMQ Manager offers some of the possibilities of the intelmqctl tool and has a graphical interface for runtime and pipeline configurations.
See the IntelMQ Manager repository.
Command-line interface: intelmqctl¶
Syntax see intelmqctl -h
Starting a bot:
intelmqctl start bot-id
Stopping a bot:
intelmqctl stop bot-id
Reloading a bot:
intelmqctl reload bot-id
Restarting a bot:
intelmqctl restart bot-id
Get status of a bot:
intelmqctl status bot-id
Run a bot directly for debugging purpose and temporarily leverage the logging level to DEBUG:
intelmqctl run bot-id
Get a pdb (or ipdb if installed) live console.
intelmqctl run bot-id console
See the message that waits in the input queue.
intelmqctl run bot-id message get
See additional help for further explanation.
intelmqctl run bot-id --help
Starting the botnet (all bots):
intelmqctl start
Starting a group of bots:
intelmqctl start --group experts
Get a list of all configured bots:
intelmqctl list bots
Get a list of all queues:
intelmqctl list queues
If -q is given, only queues with more than one item are listed.Get a list of all queues and status of the bots:
intelmqctl list queues-and-status
Clear a queue:
intelmqctl clear queue-id
Get logs of a bot:
intelmqctl log bot-id number-of-lines log-level
Reads the last lines from bot log. Log level should be one of DEBUG, INFO, ERROR or CRITICAL. Default is INFO. Number of lines defaults to 10, -1 gives all. Result can be longer due to our logging format!Upgrade from a previous version:
intelmqctl upgrade-config
Make a backup of your configuration first, also including bot’s configuration files.
Botnet Concept¶
The “botnet” represents all currently configured bots which are explicitly enabled. It is, in essence, the graph of the bots which are connected together via their input source queues and destination queues.
To get an overview which bots are running, use intelmqctl status
or use the IntelMQ Manager. Set "enabled": true
in the runtime configuration to add a bot to the botnet. By default, bots will be configured as "enabled": true
. See Bots inventory for more details on configuration.
Disabled bots can still be started explicitly using intelmqctl start <bot_id>
, but will remain in the state disabled
if stopped (and not be implicitly enabled by the start
command). They are not started by intelmqctl start
in analogy to the behavior of widely used initialization systems.
Scheduled Run Mode¶
In many cases, it is useful to schedule a bot at a specific time (i.e. via cron(1)), for example to collect information from a website every day at midnight. To do this, set run_mode
to scheduled
in the runtime.yaml
for the bot. Check out the following example:
blocklistde-apache-collector:
name: Generic URL Fetcher
group: Collector
module: intelmq.bots.collectors.http.collector_http
description: All IP addresses which have been reported within the last 48 hours as having run attacks on the service Apache, Apache-DDOS, RFI-Attacks.
enabled: false
run_mode: scheduled
parameters:
feed: Blocklist.de Apache
provider: Blocklist.de
http_url: https://lists.blocklist.de/lists/apache.txt
ssl_client_certificate: null
You can schedule the bot with a crontab-entry like this:
0 0 * * * intelmqctl start blocklistde-apache-collector
Bots configured as scheduled
will exit after the first successful run.
Setting enabled
to false
will cause the bot to not start with intelmqctl start
, but only with an explicit start, in this example intelmqctl start blocklistde-apache-collector
.
Continuous Run Mode¶
Most of the cases, bots will need to be configured as continuous
run mode (the default) in order to have them always running and processing events. Usually, the types of bots that will require the continuous mode will be Parsers, Experts and Outputs. To do this, set run_mode
to continuous
in the runtime.yaml
for the bot. Check the following example:
blocklistde-apache-parser:
name: Blocklist.de Parser
group: Parser
module: intelmq.bots.parsers.blocklistde.parser
description: Blocklist.DE Parser is the bot responsible to parse the report and sanitize the information.
enabled: false
run_mode: continuous
parameters: ...
You can now start the bot using the following command:
intelmqctl start blocklistde-apache-parser
Bots configured as continuous
will never exit except if there is an error and the error handling configuration requires the bot to exit. See the Error Handling section for more details.
Reloading¶
Whilst restart is a mere stop & start, performing intelmqctl reload <bot_id>
will not stop the bot, permitting it to keep the state: the same common behavior as for (Linux) daemons. It will initialize again (including reading all configuration again) after the current action is finished. Also, the rate limit/sleep is continued (with the new time) and not interrupted like with the restart command. So if you have a collector with a rate limit of 24 h, the reload does not trigger a new fetching of the source at the time of the reload, but just 24 h after the last run – with the new configuration.
Which state the bots are keeping depends on the bots of course.
Forcing reset pipeline and cache (be careful)¶
If you are using the default broker (Redis), in some test situations you may need to quickly clear all pipelines and caches. Use the following procedure:
redis-cli FLUSHDB
redis-cli FLUSHALL
Error Handling¶
Tool: intelmqdump¶
When bots are failing due to bad input data or programming errors, they can dump the problematic message to a file along with a traceback, if configured accordingly. These dumps are saved at in the logging directory as [botid].dump
as JSON files. IntelMQ comes with an inspection and reinjection tool, called intelmqdump
. It is an interactive tool to show all dumped files and the number of dumps per file. Choose a file by bot-id or listed numeric id. You can then choose to delete single entries from the file with e 1,3,4
, show a message in more readable format with s 1
(prints the raw-message, can be long!), recover some messages and put them back in the pipeline for the bot by a
or r 0,4,5
. Or delete the file with all dumped messages using d
.
intelmqdump -h
usage:
intelmqdump [botid]
intelmqdump [-h|--help]
intelmqdump can inspect dumped messages, show, delete or reinject them into
the pipeline. It's an interactive tool, directly start it to get a list of
available dumps or call it with a known bot id as parameter.
positional arguments:
botid botid to inspect dumps of
optional arguments:
-h, --help show this help message and exit
--truncate TRUNCATE, -t TRUNCATE
Truncate raw-data with more characters than given. 0 for no truncating. Default: 1000.
Interactive actions after a file has been selected:
- r, Recover by IDs
> r id{,id} [queue name]
> r 3,4,6
> r 3,7,90 modify-expert-queue
The messages identified by a consecutive numbering will be stored in the
original queue or the given one and removed from the file.
- a, Recover all
> a [queue name]
> a
> a modify-expert-queue
All messages in the opened file will be recovered to the stored or given
queue and removed from the file.
- d, Delete entries by IDs
> d id{,id}
> d 3,5
The entries will be deleted from the dump file.
- d, Delete file
> d
Delete the opened file as a whole.
- s, Show by IDs
> s id{,id}
> s 0,4,5
Show the selected IP in a readable format. It's still a raw format from
repr, but with newlines for message and traceback.
- e, Edit by ID
> e id
> e 0
> e 1,2
Opens an editor (by calling `sensible-editor`) on the message. The modified message is then saved in the dump.
- q, Quit
> q
$ intelmqdump
id: name (bot id) content
0: alienvault-otx-parser 1 dumps
1: cymru-whois-expert 8 dumps
2: deduplicator-expert 2 dumps
3: dragon-research-group-ssh-parser 2 dumps
4: file-output2 1 dumps
5: fraunhofer-dga-parser 1 dumps
6: spamhaus-cert-parser 4 dumps
7: test-bot 2 dumps
Which dump file to process (id or name)? 3
Processing dragon-research-group-ssh-parser: 2 dumps
0: 2015-09-03T13:13:22.159014 InvalidValue: invalid value u'NA' (<type 'unicode'>) for key u'source.asn'
1: 2015-09-01T14:40:20.973743 InvalidValue: invalid value u'NA' (<type 'unicode'>) for key u'source.asn'
(r)ecover by ids, recover (a)ll, delete (e)ntries, (d)elete file, (s)how by ids, (q)uit, edit id (v)? d
Deleted file /opt/intelmq/var/log/dragon-research-group-ssh-parser.dump
Bots and the intelmqdump tool use file locks to prevent writing to already opened files. Bots are trying to lock the file for up to 60 seconds if the dump file is locked already by another process (intelmqdump) and then give up. Intelmqdump does not wait and instead only shows an error message.
By default, the show
command truncates the raw
field of messages at 1000 characters to change this limit or disable truncating at all (value 0), use the --truncate
parameter.
Monitoring Logs¶
All bots and intelmqctl
log to /opt/intelmq/var/log/
/var/log/intelmq/
(depending on your installation). In case of failures, messages are dumped to the same directory with the file ending .dump
.
tail -f /opt/intelmq/var/log/*.log
tail -f /var/log/intelmq/*.log
Uninstall¶
If you installed intelmq with native packages: Use the package management tool to remove the package intelmq
. These tools do not remove configuration by default.
If you installed manually via pip (note that this also deletes all configuration and possibly data):
pip3 uninstall intelmq
rm -r /opt/intelmq
Integration with ticket systems, etc.¶
First of all, IntelMQ is a message (event) processing system: it collects feeds, processes them, enriches them, filters them and then stores them somewhere or sends them to another system. It does this in a composable, data flow oriented fashion, based on single events. There are no aggregation or grouping features. Now, if you want to integrate IntelMQ with your ticket system or some other system, you need to send its output to somewhere where your ticket system or other services can pick up IntelMQ’s data. This could be a database, splunk, or you could send your events directly via email to a ticket system.
- Different users came up with different solutions for this, each of them fitting their own organisation. Hence these solutions are not part of the core IntelMQ repository.
CERT.at uses a postgresql DB (sql output bot) and has a small tool
intelmqcli
which fetches the events in the postgresql DB which are marked as “new” and will group them and send them out via the RT ticket system.Others, including BSI, use a tool called
intelmq-mailgen
. It sends E-Mails to the recipients, optionally PGP-signed with defined text-templates, CSV formatted attachments with grouped events and generated ticket numbers.
The following lists external github repositories which you might consult for examples on how to integrate IntelMQ into your workflow:
If you came up with another solution for integration, we’d like to hear from you! Please reach out to us on the IntelMQ Users Mailinglist.
Frequently Asked Questions¶
Consult the Frequently asked questions if you encountered any problems.
Additional Information¶
Bash Completion¶
To enable bash completion on intelmqctl
and intelmqdump
in order to help you run the commands in an easy manner, follow the installation process here.
Bots inventory¶
Contents
General remarks¶
By default all of the bots are started when you start the whole botnet, however there is a possibility to
disable a bot. This means that the bot will not start every time you start the botnet, but you can start
and stop the bot if you specify the bot explicitly. To disable a bot, add the following to your
runtime.yaml
: “enabled”: false. Be aware that this is not a normal parameter (like the others
described in this file). It is set outside of the parameters object in runtime.yaml
. Check out
Configuration and Management for an example.
There are two different types of parameters: The initialization parameters are need to start the bot. The runtime parameters are needed by the bot itself during runtime.
The initialization parameters are in the first level, the runtime parameters live in the parameters sub-dictionary:
bot-id:
parameters:
runtime parameters...
initialization parameters...
For example:
abusech-feodo-domains-collector:
parameters:
provider: Abuse.ch
name: Abuse.ch Feodo Domains
http_url: http://example.org/feodo-domains.txt
name: Generic URL Fetcher
group: Collector
module: intelmq.bots.collectors.http.collector_http
description: collect report messages from remote hosts using http protocol
enabled: true
run_mode: scheduled
This configuration resides in the file runtime.yaml in your IntelMQ’s configuration directory for each configured bot.
Initialization parameters¶
name and description: The name and description of the bot. See also
intelmqctl list --configured bots
.group: Can be “Collector”, “Parser”, “Expert” or “Output”. Only used for visualization by other tools.
module: The executable (should be in $PATH) which will be started.
enabled: If the parameter is set to true (which is NOT the default value if it is missing as a protection) the bot will start when the botnet is started (intelmqctl start). If the parameter was set to false, the Bot will not be started by intelmqctl start, however you can run the bot independently using intelmqctl start <bot_id>. Check Configuration and Management for more details.
run_mode: There are two run modes, “continuous” (default run mode) or “scheduled”. In the first case, the bot will be running forever until stopped or exits because of errors (depending on configuration). In the latter case, the bot will stop after one successful run. This is especially useful when scheduling bots via cron or systemd. Default is continuous. Check Configuration and Management for more details.
Common parameters¶
Feed parameters¶
Common configuration options for all collectors.
name: Name for the feed (feed.name). In IntelMQ versions smaller than 2.2 the parameter name feed is also supported.
accuracy: Accuracy for the data of the feed (feed.accuracy).
code: Code for the feed (feed.code).
documentation: Link to documentation for the feed (feed.documentation).
provider: Name of the provider of the feed (feed.provider).
rate_limit: time interval (in seconds) between fetching data if applicable.
HTTP parameters¶
Common URL fetching parameters used in multiple bots.
http_timeout_sec: A tuple of floats or only one float describing the timeout of the HTTP connection. Can be a tuple of two floats (read and connect timeout) or just one float (applies for both timeouts). The default is 30 seconds in default.conf, if not given no timeout is used. See also https://requests.readthedocs.io/en/master/user/advanced/#timeouts
http_timeout_max_tries: An integer depicting how often a connection is retried, when a timeout occurred. Defaults to 3 in default.conf.
http_username: username for basic authentication.
http_password: password for basic authentication.
http_proxy: proxy to use for HTTP
https_proxy: proxy to use for HTTPS
http_user_agent: user agent to use for the request.
http_verify_cert: path to trusted CA bundle or directory, false to ignore verifying SSL certificates, or true (default) to verify SSL certificates
ssl_client_certificate: SSL client certificate to use.
ssl_ca_certificate: Optional string of path to trusted CA certificate. Only used by some bots.
http_header: HTTP request headers
Cache parameters¶
Common Redis cache parameters used in multiple bots (mainly lookup experts):
redis_cache_host: Hostname of the Redis database.
redis_cache_port: Port of the Redis database.
redis_cache_db: Database number.
redis_cache_ttl: TTL used for caching.
redis_cache_password: Optional password for the Redis database (default: none).
Collector Bots¶
Multihreading is disabled for all Collectors, as this would lead to duplicated data.
AMQP¶
Requires the pika python library, minimum version 1.0.0.
Information
name: intelmq.bots.collectors.amqp.collector_amqp
lookup: yes
public: yes
cache (redis db): none
description: collect data from (remote) AMQP servers, for both IntelMQ as well as external data
Configuration Parameters
Feed parameters (see above)
connection_attempts: The number of connection attempts to defined server, defaults to 3
connection_heartbeat: Heartbeat to server, in seconds, defaults to 3600
connection_host: Name/IP for the AMQP server, defaults to 127.0.0.1
connection_port: Port for the AMQP server, defaults to 5672
connection_vhost: Virtual host to connect, on an HTTP(S) connection would be http:/IP/<your virtual host>
expect_intelmq_message: Boolean, if the data is from IntelMQ or not. Default: false. If true, then the data can be any Report or Event and will be passed to the next bot as is. Otherwise a new report is created with the raw data.
password: Password for authentication on your AMQP server
queue_name: The name of the queue to fetch data from
username: Username for authentication on your AMQP server
use_ssl: Use ssl for the connection, make sure to also set the correct port, usually 5671 (true/false)
Currently only fetching from a queue is supported can be extended in the future. Messages will be acknowledge at AMQP after it is sent to the pipeline.
API¶
Information
name: intelmq.bots.collectors.api.collector
lookup: yes
public: yes
cache (redis db): none
description: collect report messages from an HTTP or Socket REST API
Configuration Parameters
Feed parameters (see above)
port: Optional, integer. Default: 5000. The local port, the API will be available at.
use_socket: Optional, boolean. Default: false. If true, the socket will be opened at the location given with socket_path.
socket_path: Optional, string. Default:
/tmp/imq_api_default_socket
The API is available at /intelmq/push if the HTTP interface is used (default). The tornado library is required.
Generic URL Fetcher¶
Information
name: intelmq.bots.collectors.http.collector_http
lookup: yes
public: yes
cache (redis db): none
description: collect report messages from remote hosts using HTTP protocol
Configuration Parameters
Feed parameters (see above)
HTTP parameters (see above)
extract_files: Optional, boolean or list of strings. If it is true, the retrieved (compressed) file or archived will be uncompressed/unpacked and the files are extracted. If the parameter is a list for strings, only the files matching the filenames are extracted. Extraction handles gzipped files and both compressed and uncompressed tar-archives as well as zip archives.
http_url: location of information resource (e.g. https://feodotracker.abuse.ch/blocklist/?download=domainblocklist)
http_url_formatting: (bool|JSON, default: false) If true, {time[format]} will be replaced by the current time in local timezone formatted by the given format. E.g. if the URL is http://localhost/{time[%Y]}, then the resulting URL is http://localhost/2019 for the year 2019. (Python’s Format Specification Mini-Language is used for this.). You may use a JSON specifying time-delta parameters to shift the current time accordingly. For example use {“days”: -1} for the yesterday’s date; the URL http://localhost/{time[%Y-%m-%d]} will get translated to “http://localhost/2018-12-31” for the 1st Jan of 2019.
verify_pgp_signatures: bool, defaults to false. If true, signature file is downloaded and report file is checked. On error (missing signature, mismatch, …), the error is logged and the report is not processed. Public key has to be imported in local keyring. This requires the python-gnupg library.
signature_url: Location of signature file for downloaded content. For path http://localhost/data/latest.json this may be for example http://localhost/data/latest.asc.
signature_url_formatting: (bool|JSON, default: false) The same as http_url_formatting, only for the signature file.
gpg_keyring: string or none (default). If specified, the string represents path to keyring file, otherwise the PGP keyring file for current intelmq user is used.
Zipped files are automatically extracted if detected.
For extracted files, every extracted file is sent in its own report. Every report has a field named extra.file_name with the file name in the archive the content was extracted from.
HTTP Response status code checks
If the HTTP response’ status code is not 2xx, this is treated as error.
In Debug logging level, the request’s and response’s headers and body are logged for further inspection.
Generic URL Stream Fetcher¶
Information
name: intelmq.bots.collectors.http.collector_http_stream
lookup: yes
public: yes
cache (redis db): none
description: Opens a streaming connection to the URL and sends the received lines.
Configuration Parameters
Feed parameters (see above)
HTTP parameters (see above)
strip_lines: boolean, if single lines should be stripped (removing whitespace from the beginning and the end of the line)
If the stream is interrupted, the connection will be aborted using the timeout parameter. No error will be logged if the number of consecutive connection fails does not reach the parameter error_max_retries. Instead of errors, an INFO message is logged. This is a measurement against too frequent ERROR logging messages. The consecutive connection fails are reset if a data line has been successfully transferred. If the consecutive connection fails reaches the parameter error_max_retries, an exception will be thrown and rate_limit applies, if not null.
The parameter http_timeout_max_tries is of no use in this collector.
Generic Mail URL Fetcher¶
Information
name: intelmq.bots.collectors.mail.collector_mail_url
lookup: yes
public: yes
cache (redis db): none
description: collect messages from mailboxes, extract URLs from that messages and download the report messages from the URLs.
Configuration Parameters
Feed parameters (see above)
HTTP parameters (see above)
mail_host: FQDN or IP of mail server
mail_user: user account of the email account
mail_password: password associated with the user account
mail_port: IMAP server port, optional (default: 143 without SSL, 993 for SSL)
mail_ssl: whether the mail account uses SSL (default: true)
folder: folder in which to look for mails (default: INBOX)
subject_regex: regular expression to look for a subject
url_regex: regular expression of the feed URL to search for in the mail body
sent_from: filter messages by sender
sent_to: filter messages by recipient
ssl_ca_certificate: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.
The resulting reports contains the following special fields:
feed.url: The URL the data was downloaded from
extra.email_date: The content of the email’s Date header
extra.email_subject: The subject of the email
extra.email_from: The email’s from address
extra.email_message_id: The email’s message ID
extra.file_name: The file name of the downloaded file (extracted from the HTTP Response Headers if possible).
Chunking
For line-based inputs the bot can split up large reports into smaller chunks.
This is particularly important for setups that use Redis as a message queue which has a per-message size limitation of 512 MB.
To configure chunking, set chunk_size to a value in bytes. chunk_replicate_header determines whether the header line should be repeated for each chunk that is passed on to a parser bot.
Specifically, to configure a large file input to work around Redis’ size limitation set chunk_size to something like 384000000, i.e., ~384 MB.
Generic Mail Attachment Fetcher¶
Information
name: intelmq.bots.collectors.mail.collector_mail_attach
lookup: yes
public: yes
cache (redis db): none
description: collect messages from mailboxes, download the report messages from the attachments.
Configuration Parameters
Feed parameters (see above)
extract_files: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher for more details.
mail_host: FQDN or IP of mail server
mail_user: user account of the email account
mail_password: password associated with the user account
mail_port: IMAP server port, optional (default: 143 without SSL, 993 for SSL)
mail_ssl: whether the mail account uses SSL (default: true)
folder: folder in which to look for mails (default: INBOX)
subject_regex: regular expression to look for a subject
attach_regex: regular expression of the name of the attachment
attach_unzip: whether to unzip the attachment. Only extracts the first file. Deprecated, use extract_files instead.
sent_from: filter messages by sender
sent_to: filter messages by recipient
ssl_ca_certificate: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.
The resulting reports contains the following special fields:
extra.email_date: The content of the email’s Date header
extra.email_subject: The subject of the email
extra.email_from: The email’s from address
extra.email_message_id: The email’s message ID
extra.file_name: The file name of the attachment or the file name in the attached archive if attachment is to uncompress.
Generic Mail Body Fetcher¶
Information
name: intelmq.bots.collectors.mail.collector_mail_body
lookup: yes
public: yes
cache (redis db): none
description: collect messages from mailboxes, forwards the bodies as reports. Each non-empty body with the matching content type is sent as individual report.
Configuration Parameters
Feed parameters (see above)
mail_host: FQDN or IP of mail server
mail_user: user account of the email account
mail_password: password associated with the user account
mail_port: IMAP server port, optional (default: 143 without SSL, 993 for SSL)
mail_ssl: whether the mail account uses SSL (default: true)
folder: folder in which to look for mails (default: INBOX)
subject_regex: regular expression to look for a subject
sent_from: filter messages by sender
sent_to: filter messages by recipient
ssl_ca_certificate: Optional string of path to trusted CA certificate. Applies only to IMAP connections, not HTTP. If the provided certificate is not found, the IMAP connection will fail on handshake. By default, no certificate is used.
content_types: Which bodies to use based on the content_type. Default: true/[‘html’, ‘plain’] for all: - string with comma separated values, e.g. [‘html’, ‘plain’] - true, false, null: Same as default value - string, e.g. ‘plain’
The resulting reports contains the following special fields:
extra.email_date: The content of the email’s Date header
extra.email_subject: The subject of the email
extra.email_from: The email’s from address
extra.email_message_id: The email’s message ID
Github API¶
Information
name: intelmq.bots.collectors.github_api.collector_github_contents_api
lookup: yes
public: yes
cache (redis db): none
description: Collects files matched by regular expression from GitHub repository via the GitHub API. Optionally with GitHub credentials, which are used as the Basic HTTP authentication.
Configuration Parameters
Feed parameters (see above)
personal_access_token: GitHub account personal access token [GitHub documentation: Creating a personal access token](https://developer.github.com/changes/2020-02-14-deprecating-password-auth/#removal)
repository: GitHub target repository (<USER>/<REPOSITORY>)
regex: Valid regular expression of target files within the repository (defaults to .*.json)
extra_fields: Comma-separated list of extra fields from GitHub contents API.
Workflow
The optional authentication parameters provide a high limit of the GitHub API requests. With the git hub user authentication, the requests are rate limited to 5000 per hour, otherwise to 60 requests per hour.
The collector recursively searches for regex-defined files in the provided repository. Additionally it adds extra file metadata defined by the extra_fields.
The bot always sets the url, from which downloaded the file, as feed.url.
Fileinput¶
Information
name: intelmq.bots.collectors.file.collector_file
lookup: yes
public: yes
cache (redis db): none
description: This bot is capable of reading files from the local file-system. This is handy for testing purposes, or when you need to react to spontaneous events. In combination with the Generic CSV Parser this should work great.
Configuration Parameters
Feed parameters (see above)
path: path to file
postfix: The postfix (file ending) of the files to look for. For example .csv.
delete_file: whether to delete the file after reading (default: false)
The resulting reports contains the following special fields:
feed.url: The URI using the file:// scheme and localhost, with the full path to the processed file.
extra.file_name: The file name (without path) of the processed file.
Chunking
Additionally, for line-based inputs the bot can split up large reports into smaller chunks.
This is particularly important for setups that use Redis as a message queue which has a per-message size limitation of 512 MB.
To configure chunking, set chunk_size to a value in bytes. chunk_replicate_header determines whether the header line should be repeated for each chunk that is passed on to a parser bot.
Specifically, to configure a large file input to work around Redis’ size limitation set chunk_size to something like 384000, i.e., ~384 MB.
Workflow
The bot loops over all files in path and tests if their file name matches postfix, e.g. `.csv`. If yes, the file will be read and inserted into the queue.
If delete_file is set, the file will be deleted after processing. If deletion is not possible, the bot will stop.
To prevent data loss, the bot also stops when no postfix is set and delete_file was set. This cannot be overridden.
The bot always sets the file name as feed.url
Fireeye¶
Information
name: intelmq.bots.collectors.fireeye.collector_fireeye
lookup: yes
public: no
cache (redis db): none
description: This bot is capable of collecting hashes and URLs from a Fireeye MAS appliance.
The Python library xmltodict is required to run this bot.
Configuration Parameters
Feed parameters (see above)
dns_name: DNS name of the target appliance.
request_duration: Length of the query in past eg. collect alerts from last 24hours/48hours.
http_username: Password for authentication.
http_password: Username for authentication.
Workflow
The bot collects all alerts which occurred during specified duration. After this we make a second call and check if there is additional information like domains and hashes available. After collecting the openioc data we send this information to the Fireeye parser.
Kafka¶
Requires the kafka python library.
Information
name: intelmq.bots.collectors.kafka.collector
Configuration parameters
topic: the kafka topic the collector should get messages from
bootstrap_servers: the kafka server(s) the collector should connect to. Defaults to localhost:9092
ssl_check_hostname: false to ignore verifying SSL certificates, or true (default) to verify SSL certificates
ssl_client_certificate: SSL client certificate to use.
ssl_ca_certificate: Optional string of path to trusted CA certificate. Only used by some bots.
MISP Generic¶
Information
name: intelmq.bots.collectors.misp.collector
lookup: yes
public: yes
cache (redis db): none
description: collect messages from MISP, a malware information sharing platform server.
Configuration Parameters
Feed parameters (see above)
misp_url: URL of MISP server (with trailing ‘/’)
misp_key: MISP Authkey
misp_tag_to_process: MISP tag for events to be processed
misp_tag_processed: MISP tag for processed events, optional
Generic parameters used in this bot:
http_verify_cert: Verify the TLS certificate of the server, boolean (default: true)
Workflow This collector will search for events on a MISP server that have a to_process tag attached to them (see the misp_tag_to_process parameter) and collect them for processing by IntelMQ. Once the MISP event has been processed the to_process tag is removed from the MISP event and a processed tag is then attached (see the misp_tag_processed parameter).
NB. The MISP tags must be configured to be ‘exportable’ otherwise they will not be retrieved by the collector.
Request Tracker¶
Information
name: intelmq.bots.collectors.rt.collector_rt
lookup: yes
public: yes
cache (redis db): none
description: Request Tracker Collector fetches attachments from an RTIR instance.
You need the rt-library >= 1.9 from nic.cz, available via pypi: pip3 install rt
This rt bot will connect to RT and inspect the given search_queue for tickets matching all criteria in search_*, Any matches will be inspected. For each match, all (RT-) attachments of the matching RT tickets are iterated over and within this loop, the first matching filename in the attachment is processed. If none of the filename matches apply, the contents of the first (RT-) “history” item is matched against the regular expression for the URL (url_regex).
Configuration Parameters
Feed parameters (see above)
HTTP parameters (see above)
extract_attachment: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher parameter extract_files for more details.
extract_download: Optional, boolean or list of strings. See documentation of the Generic URL Fetcher parameter extract_files for more details.
uri: URL of the REST interface of the RT
user: RT username
password: RT password
search_not_older_than: Absolute time (use ISO format) or relative time, e.g. 3 days.
search_owner: owner of the ticket to search for (default: nobody)
search_queue: queue of the ticket to search for (default: Incident Reports)
search_requestor: the e-mail address of the requestor
search_status: status of the ticket to search for (default: new)
search_subject_like: part of the subject of the ticket to search for (default: Report)
set_status: status to set the ticket to after processing (default: open). false or null to not set a different status.
take_ticket: whether to take the ticket (default: true)
url_regex: regular expression of an URL to search for in the ticket
attachment_regex: regular expression of an attachment in the ticket
unzip_attachment: whether to unzip a found attachment. Only the first file in the archive is used. Deprecated in favor of extract_attachment.
The parameter http_timeout_max_tries is of no use in this collector.
The resulting reports contains the following special fields:
rtir_id: The ticket ID
extra.email_subject and extra.ticket_subject: The subject of the ticket
extra.email_from and extra.ticket_requestors: Comma separated list of the ticket’s requestor’s email addresses.
extra.ticket_owner: The ticket’s owner name
extra.ticket_status: The ticket’s status
extra.ticket_queue: The ticket’s queue
extra.file_name: The name of the extracted file, the name of the downloaded file or the attachments’ filename without .gz postfix.
time.observation: The creation time of the ticket or attachment.
Search
The parameters prefixed with search_ allow configuring the ticket search.
Empty strings and null as value for search parameters are ignored.
File downloads
Attachments can be optionally unzipped, remote files are downloaded with the http_* settings applied.
If url_regex or attachment_regex are empty strings, false or null, they are ignored.
Ticket processing
Optionally, the RT bot can “take” RT tickets (i.e. the user is assigned this ticket now) and/or the status can be changed (leave set_status empty in case you don’t want to change the status). Please note however that you MUST do one of the following: either “take” the ticket or set the status (set_status). Otherwise, the search will find the ticket every time and we will have generated an endless loop.
In case a resource needs to be fetched and this resource is permanently not available (status code is 4xx), the ticket status will be set according to the configuration to avoid processing the ticket over and over. For temporary failures the status is not modified, instead the ticket will be skipped in this run.
Time search
To find only tickets newer than a given absolute or relative time, you can use the search_not_older_than parameter. Absolute time specification can be anything parseable by dateutil, best use a ISO format.
Relative must be in this format: [number] [timespan]s, e.g. 3 days. timespan can be hour, day, week, month, year. Trailing ‘s’ is supported for all timespans. Relative times are subtracted from the current time directly before the search is performed.
Rsync¶
Requires the rsync executable
Information
name: intelmq.bots.collectors.rsync.collector_rsync
lookup: yes
public: yes
cache (redis db): none
description: Bot downloads a file by rsync and then load data from downloaded file. Downloaded file is located in var/lib/bots/rsync_collector.
Configuration Parameters
Feed parameters (see above)
file: Name of downloaded file.
file: The filename to process, combined with rsync_path.
rsync_path: Path to file. It can be “/home/username/directory” or “username@remote_host:/home/username/directory”
temp_directory: The temporary directory for rsync to use for rsync’d files. Optional. Default: $VAR_STATE_PATH/rsync_collector. $VAR_STATE_PATH is /var/run/intelmq/ or /opt/intelmq/var/run/.
Shadowserver Reports API¶
The Cache is required to memorize which files have already been processed (TTL needs to be high enough to cover the oldest files available!).
Information
name: intelmq.bots.collectors.shadowserver.collector_reports_api
description: Connects to the Shadowserver API, requests a list of all the reports for a specific country and processes the ones that are new.
Configuration Parameters
country: Deprecated: The country you want to download the reports for. Will be removed in IntelMQ version 4.0.0, use reports instead.
apikey: Your Shadowserver API key
secret: Your Shadowserver API secret
reports: A list of strings or a comma-separated list of the mailing lists you want to process.
types: A list of strings or a string of comma-separated values with the names of report types you want to process. If you leave this empty, all the available reports will be downloaded and processed (i.e. ‘scan’, ‘drones’, ‘intel’, ‘sandbox_connection’, ‘sinkhole_combined’). The possible report types are equivalent to the file names given in the section Supported Reports of the Shadowserver parser.
Cache parameters (see in section Common parameters, the default TTL is set to 10 days)
The resulting reports contain the following special field:
extra.file_name: The name of the downloaded file, with fixed filename extension. The API returns file names with the extension .csv, although the files are JSON, not CSV. Therefore, for clarity and better error detection in the parser, the file name in extra.file_name uses .json as extension.
Shodan Stream¶
- Requires the shodan library to be installed:
Information
name: intelmq.bots.collectors.shodan.collector_stream
lookup: yes
public: yes
cache (redis db): none
description: Queries the Shodan Streaming API
Configuration Parameters
Feed parameters (see above)
HTTP parameters (see above). Only the proxy is used (requires shodan-python > 1.8.1). Certificate is always verified.
countries: A list of countries to query for. If it is a string, it will be spit by ,.
If the stream is interrupted, the connection will be aborted using the timeout parameter. No error will be logged if the number of consecutive connection fails does not reach the parameter error_max_retries. Instead of errors, an INFO message is logged. This is a measurement against too frequent ERROR logging messages. The consecutive connection fails are reset if a data line has been successfully transferred. If the consecutive connection fails reaches the parameter error_max_retries, an exception will be thrown and rate_limit applies, if not null.
TCP¶
Information
name: intelmq.bots.collectors.tcp.collector
lookup: no
public: yes
cache (redis db): none
description: TCP is the bot responsible to receive events on a TCP port (ex: from TCP Output of another IntelMQ instance). Might not be working on Python3.4.6.
Configuration Parameters
ip: IP of destination server
port: port of destination server
Response
TCP collector just sends an “Ok” message after every received message, this should not pose a problem for an arbitrary input. If you intend to link two IntelMQ instance via TCP, have a look at the TCP output bot documentation.
Alien Vault OTX¶
Information
name: intelmq.bots.collectors.alienvault_otx.collector
lookup: yes
public: yes
cache (redis db): none
description: collect report messages from Alien Vault OTX API
Requirements
Install the library from GitHub, as there is no package in PyPi:
pip3 install -r intelmq/bots/collectors/alienvault_otx/REQUIREMENTS.txt
Configuration Parameters
Feed parameters (see above)
api_key: API Key
modified_pulses_only: get only modified pulses instead of all, set to it to true or false, default false
interval: if “modified_pulses_only” is set, define the time in hours (integer value) to get modified pulse since then, default 24 hours
Blueliv Crimeserver¶
Information
name: intelmq.bots.collectors.blueliv.collector_crimeserver
lookup: yes
public: no
cache (redis db): none
description: collect report messages from Blueliv API
For more information visit https://github.com/Blueliv/api-python-sdk
Requirements
Install the required library:
pip3 install -r intelmq/bots/collectors/blueliv/REQUIREMENTS.txt
Configuration Parameters
Feed parameters (see above)
api_key: location of information resource, see https://map.blueliv.com/?redirect=get-started#signup
api_url: The optional API endpoint, by default https://freeapi.blueliv.com.
Calidog Certstream¶
A Bot to collect data from the Certificate Transparency Log (CTL) This bot works based on certstream library (https://github.com/CaliDog/certstream-python)
Information
name: intelmq.bots.collectors.calidog.collector_certstream
lookup: yes
public: no
cache (redis db): none
description: collect data from Certificate Transparency Log
Configuration Parameters
Feed parameters (see above)
ESET ETI¶
Information
name: intelmq.bots.collectors.eset.collector
lookup: yes
public: no
cache (redis db): none
description: collect data from ESET ETI TAXII server
For more information visit https://www.eset.com/int/business/services/threat-intelligence/
Requirements
Install the required cabby library:
pip3 install -r intelmq/bots/collectors/eset/REQUIREMENTS.txt
Configuration Parameters
Feed parameters (see above)
username: Your username
password: Your password
endpoint: eti.eset.com
time_delta: The time span to look back, in seconds. Default 3600.
collection: The collection to fetch.
McAfee openDXL¶
Information
name: intelmq.bots.collectors.opendxl.collector
lookup: yes
public: no
cache (redis db): none
description: collect messages via openDXL
Configuration Parameters
Feed parameters (see above)
dxl_config_file: location of the configuration file containing required information to connect $
dxl_topic: the name of the DXL topic to subscribe
Microsoft Azure¶
Iterates over all blobs in all containers in an Azure storage. The Cache is required to memorize which files have already been processed (TTL needs to be high enough to cover the oldest files available!).
This bot significantly changed in a backwards-incompatible way in IntelMQ Version 2.2.0 to support current versions of the Microsoft Azure Python libraries.
azure-storage-blob>=12.0.0
is required.
Information
name: intelmq.bots.collectors.microsoft.collector_azure
lookup: yes
public: no
cache (redis db): 5
description: collect blobs from Microsoft Azure using their library
Configuration Parameters
Cache parameters (see above)
Feed parameters (see above)
connection_string: connection string as given by Microsoft
container_name: name of the container to connect to
Microsoft Interflow¶
Iterates over all files available by this API. Make sure to limit the files to be downloaded with the parameters, otherwise you will get a lot of data! The cache is used to remember which files have already been downloaded. Make sure the TTL is high enough, higher than not_older_than.
Information
name: intelmq.bots.collectors.microsoft.collector_interflow
lookup: yes
public: no
cache (redis db): 5
description: collect files from Microsoft Interflow using their API
Configuration Parameters
Feed parameters (see above)
api_key: API generate in their portal
file_match: an optional regular expression to match file names
not_older_than: an optional relative (minutes) or absolute time (UTC is assumed) expression to determine the oldest time of a file to be downloaded
redis_cache_* and especially redis_cache_ttl: Settings for the cache where file names of downloaded files are saved. The cache’s TTL must always be bigger than not_older_than.
Additional functionalities
Files are automatically ungzipped if the filename ends with .gz.
Stomp¶
Information
name: intelmq.bots.collectors.stomp.collector
lookup: yes
public: no
cache (redis db): none
description: collect messages from a stomp server
Requirements
Install the stomp.py library from PyPI:
pip3 install -r intelmq/bots/collectors/stomp/REQUIREMENTS.txt
Configuration Parameters
Feed parameters (see above)
exchange: exchange point
port: 61614
server: hostname e.g. “n6stream.cert.pl”
ssl_ca_certificate: path to CA file
ssl_client_certificate: path to client cert file
ssl_client_certificate_key: path to client cert key file
Twitter¶
Collects tweets from target_timelines. Up to tweet_count tweets from each user and up to timelimit back in time. The tweet text is sent separately and if allowed, links to pastebin are followed and the text sent in a separate report
Information
name: intelmq.bots.collectors.twitter.collector_twitter
lookup: yes
public: yes
cache (redis db): none
description: Collects tweets
Configuration Parameters
Feed parameters (see above)
target_timelines: screen_names of twitter accounts to be followed
tweet_count: number of tweets to be taken from each account
timelimit: maximum age of the tweets collected in seconds
follow_urls: list of screen_names for which URLs will be followed
exclude_replies: exclude replies of the followed screen_names
include_rts: whether to include retweets by given screen_name
consumer_key: Twitter API login data
consumer_secret: Twitter API login data
access_token_key: Twitter API login data
access_token_secret: Twitter API login data
API collector bot¶
Information
name: intelmq.bots.collectors.api.collector_api
lookup: no
public: no
cache (redis db): none
description: Bot for collecting data using API, you need to post JSON to /intelmq/push endpoint
example usage:
curl -X POST http://localhost:5000/intelmq/push -H 'Content-Type: application/json' --data '{"source.ip": "127.0.0.101", "classification.type": "system-compromise"}'
Configuration Parameters
Feed parameters (see above)
port: 5000
Parser Bots¶
Not complete¶
This list is not complete. Look at intelmqctl list bots
or the list of parsers shown in the manager. But most parsers do not need configuration parameters.
TODO
Configuration Parameters
default_fields: map of statically added fields to each event (only applied if parsing the event doesn’t set the value)
example usage:
defaults_fields:
classification.type: c2-server
protocol.transport: tcp
AnubisNetworks Cyberfeed Stream¶
Information
name: intelmq.bots.parsers.anubisnetworks.parser
lookup: no
public: yes
cache (redis db): none
description: parsers data from AnubisNetworks Cyberfeed Stream
Description
The feed format changes over time. The parser supports at least data from 2016 and 2020.
Events with the Malware “TestSinkholingLoss” are ignored, as they are for the feed provider’s internal purpose only and should not be processed at all.
Configuration parameters
use_malware_familiy_as_classification_identifier: default: true. Use the malw.family field as classification.type. If false, check if the same as malw.variant. If it is the same, it is ignored. Otherwise saved as extra.malware.family.
Generic CSV Parser¶
Information
name: intelmq.bots.parsers.generic.parser_csv
lookup: no
public: yes
cache (redis db): none
description: Parses CSV data
Lines starting with ‘#’ will be ignored. Headers won’t be interpreted.
Configuration parameters
“columns”: A list of strings or a string of comma-separated values with field names. The names must match the IntelMQ Data Format field names. Empty column specifications and columns named “__IGNORE__” are ignored. E.g.
"columns": [ "", "source.fqdn", "extra.http_host_header", "__IGNORE__" ],is equivalent to:
"columns": ",source.fqdn,extra.http_host_header,"The first and the last column are not used in this example.
It is possible to specify multiple columns using the | character. E.g.
"columns": "source.url|source.fqdn|source.ip"First, bot will try to parse the value as URL, if it fails, it will try to parse it as FQDN, if that fails, it will try to parse it as IP, if that fails, an error will be raised. Some use cases -
mixed data set, e.g. URL/FQDN/IP/NETMASK “columns”: “source.url|source.fqdn|source.ip|source.network”
parse a value and ignore if it fails “columns”: “source.url|__IGNORE__”
“column_regex_search”: Optional. A dictionary mapping field names (as given per the columns parameter) to regular expression. The field is evaluated using re.search. Eg. to get the ASN out of AS1234 use: {“source.asn”: “[0-9]*”}. Make sure to properly escape any backslashes in your regular expression (See also #1579).
“compose_fields”: Optional, dictionary. Create fields from columns, e.g. with data like this:
# Host,Path example.com,/foo/ example.net,/bar/using this compose_fields parameter:
{"source.url": "http://{0}{1}"}You get:
http://example.com/foo/ http://example.net/bar/in the respective source.url fields. The value in the dictionary mapping is formatted whereas the columns are available with their index.
“default_url_protocol”: For URLs you can give a default protocol which will be pretended to the data.
“delimiter”: separation character of the CSV, e.g. “,”
“skip_header”: Boolean or Int, skip the first N lines of the file (True -> 1, False -> 0), optional. Lines starting with # will be skipped additionally, make sure you do not skip more lines than needed!
time_format: Optional. If “timestamp”, “windows_nt” or “epoch_millis” the time will be converted first. With the default null fuzzy time parsing will be used.
“type”: set the classification.type statically, optional
“data_type”: sets the data of specific type, currently only “json” is supported value. An example
{ "columns": [ "source.ip", "source.url", "extra.tags"], "data_type": "{\"extra.tags\":\"json\"}" }It will ensure extra.tags is treated as json.
“filter_text”: only process the lines containing or not containing specified text, to be used in conjunction with filter_type
“filter_type”: value can be whitelist or blacklist. If whitelist, only lines containing the text in filter_text will be processed, if blacklist, only lines NOT containing the text will be processed.
To process ipset format files use
{ "filter_text": "ipset add ", "filter_type": "whitelist", "columns": [ "__IGNORE__", "__IGNORE__", "__IGNORE__", "source.ip"] }“type_translation”: If the source does have a field with information for classification.type, but it does not correspond to IntelMQ’s types, you can map them to the correct ones. The type_translation field can hold a dictionary, or a string with a JSON dictionary which maps the feed’s values to IntelMQ’s. Example:
{"malware_download": "malware-distribution"}“columns_required”: A list of true/false for each column. By default, it is true for every column.
Calidog Certstream¶
Information
name: intelmq.bots.parsers.calidog.parser_certstream
lookup: no
public: yes
cache (redis db): none
description: parsers data from Certificate Transparency Log
Description
For each domain in the leaf_cert.all_domains object one event with the domain in source.fqdn (and source.ip as fallback) is produced. The seen-date is saved in time.source and the classification type is other.
Feed parameters (see above)
ESET¶
Information
name: intelmq.bots.parsers.eset.parser
lookup: no
public: yes
cache (redis db): none
description: Parses data from ESET ETI TAXII server
Description
Supported collections:
“ei.urls (json)”
“ei.domains v2 (json)”
Cymru CAP Program¶
Information
name: intelmq.bots.parsers.cymru.parser_cap_program
public: no
cache (redis db): none
description: Parses data from Cymru’s CAP program feed.
Description
There are two different feeds available:
infected_$date.txt (“old”)
$certname_$date.txt (“new”)
The new will replace the old at some point in time, currently you need to fetch both. The parser handles both formats.
Old feed
As little information on the format is available, the mappings might not be correct in all cases. Some reports are not implemented at all as there is no data available to check if the parsing is correct at all. If you do get errors like Report … not implement or similar please open an issue and report the (anonymized) example data. Thanks.
The information about the event could be better in many cases but as Cymru does not want to be associated with the report, we can’t add comments to the events in the parser, because then the source would be easily identifiable for the recipient.
Cymru Full Bogons¶
http://www.team-cymru.com/bogon-reference.html
Information
name: intelmq.bots.parsers.cymru.parser_full_bogons
public: no
cache (redis db): none
description: Parses data from full bogons feed.
Github Feed¶
Information
name: intelmq.bots.parsers.github_feed.parser
description: Parses Feeds available publicly on GitHub (should receive from github_api collector)
Have I Been Pwned Callback Parser¶
Information
name: intelmq.bots.parsers.hibp.parser_callback
public: no
cache (redis db): none
description: Parses data from Have I Been Pwned feed.
Description
Parsers the data from a Callback of a Have I Been Pwned Enterprise Subscription.
Parses breaches and pastes and creates one event per e-mail address. The e-mail address is stored in source.account. classification.type is leak and classification.identifier is breach or paste.
HTML Table Parser¶
name: intelmq.bots.parsers.html_table.parser
public: yes
cache (redis db): none
description: Parses tables in HTML documents
Configuration parameters
“columns”: A list of strings or a string of comma-separated values with field names. The names must match the IntelMQ Data Format field names. Empty column specifications and columns named “__IGNORE__” are ignored. E.g.
"columns": [ "", "source.fqdn", "extra.http_host_header", "__IGNORE__" ],is equivalent to:
"columns": ",source.fqdn,extra.http_host_header,"The first and the last column are not used in this example. It is possible to specify multiple columns using the | character. E.g.
"columns": "source.url|source.fqdn|source.ip"First, bot will try to parse the value as URL, if it fails, it will try to parse it as FQDN, if that fails, it will try to parse it as IP, if that fails, an error will be raised. Some use cases -
mixed data set, e.g. URL/FQDN/IP/NETMASK “columns”: “source.url|source.fqdn|source.ip|source.network”
parse a value and ignore if it fails “columns”: “source.url|__IGNORE__”
“ignore_values”: A list of strings or a string of comma-separated values which will not considered while assigning to the corresponding fields given in columns. E.g.
"ignore_values": [ "", "unknown", "Not listed", ],is equivalent to:
"ignore_values": ",unknown,Not listed,"The following configuration will lead to assigning all values to malware.name and extra.SBL except unknown and Not listed respectively.
"columns": [ "source.url", "malware.name", "extra.SBL", ], "ignore_values": [ "", "unknown", "Not listed", ],Parameters columns and ignore_values must have same length
“attribute_name”: Filtering table with table attributes, to be used in conjunction with attribute_value, optional. E.g. class, id, style.
“attribute_value”: String. To filter all tables with attribute class=’details’ use
"attribute_name": "class", "attribute_value": "details"“table_index”: Index of the table if multiple tables present. If attribute_name and attribute_value given, index according to tables remaining after filtering with table attribute. Default: 0.
“split_column”: Padded column to be split to get values, to be used in conjunction with split_separator and split_index, optional.
“split_separator”: Delimiter string for padded column.
- “split_index”: Index of unpadded string in returned list from splitting split_column with split_separator as delimiter string. Default: 0.
E.g.
"split_column": "source.fqdn", "split_separator": " ", "split_index": 1,With above configuration, column corresponding to source.fqdn with value [D] lingvaworld.ru will be assigned as “source.fqdn”: “lingvaworld.ru”.
“skip_table_head”: Boolean, skip the first row of the table, optional. Default: true.
“default_url_protocol”: For URLs you can give a default protocol which will be pretended to the data. Default: “http://”.
“time_format”: Optional. If “timestamp”, “windows_nt” or “epoch_millis” the time will be converted first. With the default null fuzzy time parsing will be used.
“type”: set the classification.type statically, optional
“html_parser”: The HTML parser to use, by default “html.parser”, can also be e.g. “lxml”, have a look at https://www.crummy.com/software/BeautifulSoup/bs4/doc/
Key-Value Parser¶
Information
name: intelmq.bots.parsers.key_value.parser
lookup: no
public: no
cache (redis db): none
description: Parses text lines in key=value format, for example FortiGate firewall logs.
Configuration Parameters
pair_separator: String separating key=value pairs, default ” “ (space).
kv_separator: String separating key and value, default =.
keys: Array of string->string, names of keys to propagate mapped to IntelMQ event fields. Example:
"keys": { "srcip": "source.ip", "dstip": "destination.ip" }
The value mapped to time.source is parsed. If the value is numeric, it is interpreted. Otherwise, or if it fails, it is parsed fuzzy with dateutil. If the value cannot be parsed, a warning is logged per line.
strip_quotes: Boolean, remove opening and closing quotes from values, default true.
Parsing limitations
The input must not have (quoted) occurrences of the separator in the values. For example, this is not parsable (with space as separator):
key="long value" key2="other value"
In firewall logs like FortiGate, this does not occur. These logs usually look like:
srcip=192.0.2.1 srcmac="00:00:5e:00:17:17"
McAfee Advanced Threat Defense File¶
Information
name: intelmq.bots.parsers.mcafee.parser_atd
lookup: yes
public: no
cache (redis db): none
description: Parse IoCs from McAfee Advanced Threat Defense reports (hash, IP, URL)
Configuration Parameters
Feed parameters (see above)
verdict_severity: min report severity to parse
Microsoft CTIP Parser¶
name: intelmq.bots.parsers.microsoft.parser_ctip
public: no
cache (redis db): none
description: Parses data from the Microsoft CTIP Feed
overwrite: If an existing feed.name should be overwritten (only relevant for the azure data source).
Configuration Parameters
overwrite
: Overwrite an existing fieldfeed.name
withDataFeed
of the source.
Description
Can parse the JSON format provided by the Interflow interface (lists of dictionaries) as well as the format provided by the Azure interface (one dictionary per line). The provided data differs between the two formats/providers.
The parser is capable of parsing both feeds: - ctip-c2 - ctip-infected-summary The feeds only differ by a few fields, not in the format.
The feeds contain a field called Payload which is nearly always a base64 encoded JSON structure. If decoding works, the contained fields are saved as extra.payload.*, otherwise the field is saved as extra.payload.text.
MISP¶
name: intelmq.bots.parsers.misp.parser
public: no
cache (redis db): none
description: Parses MISP events
Description
MISP events collected by the MISPCollectorBot are passed to this parser for processing. Supported MISP event categories and attribute types are defined in the SUPPORTED_MISP_CATEGORIES and MISP_TYPE_MAPPING class constants.
n6¶
Information
name: intelmq.bots.parsers.n6.parser_n6stomp
public: no
cache (redis db): none
description: Convert n6 data into IntelMQ format.
Configuration Parameters None
Description
Test messages are ignored, this is logged with debug logging level. Also contains a mapping for the classification (results in taxonomy, type and identifier). The name field is normally used as malware.name, if that fails due to disallowed characters, these characters are removed and the original value is saved as event_description.text. This can happen for names like “further iocs: text with invalid ’ char”.
If an n6 message contains multiple IP addresses, multiple events are generated, resulting in events only differing in the address information.
Twitter¶
Information
name: intelmq.bots.parsers.twitter.parser
public: no
cache (redis db): none
description: Extracts URLs from text, fuzzy, aimed at parsing tweets
Configuration Parameters
domain_whitelist: domains to be filtered out
substitutions: semicolon delimited list of even length of pairs of substitutions (for example: ‘[.];.;,;.’ substitutes ‘[.]’ for ‘.’ and ‘,’ for ‘.’)
classification_type: string with a valid classification type as defined in data format
default_scheme: Default scheme for URLs if not given. See also the next section.
Default scheme
The dependency url-normalize changed it’s behavior in version 1.4.0 from using http:// as default scheme to https://. Version 1.4.1 added the possibility to specify it. Thus you can only use the default_scheme parameter with a current version of this library >= 1.4.1, with 1.4.0 you will always get https:// as default scheme and for older versions < 1.4.0 http:// is used.
This does not affect URLs which already include the scheme.
Shadowserver¶
There are two Shadowserver parsers, one for data in CSV
format (intelmq.bots.parsers.shadowserver.parser
) and one for data in JSON
format (intelmq.bots.parsers.shadowserver.parser_json
).
The latter was added in IntelMQ 2.3 and is meant to be used together with the Shadowserver API collector.
Information
name: intelmq.bots.parsers.shadowserver.parser (for CSV data) or intelmq.bots.parsers.shadowserver.parser_json (for JSON data)
public: yes
description: Parses different reports from Shadowserver.
Configuration Parameters
feedname: Optional, the Name of the feed, see list below for possible values.
overwrite: If an existing feed.name should be overwritten.
How this bot works?
There are two possibilities for the bot to determine which feed the data belongs to in order to determine the correct mapping of the columns:
Automatic feed detection
Since IntelMQ version 2.1 the parser can detect the feed based on metadata provided by the collector.
When processing a report, this bot takes extra.file_name from the report and looks in config.py how the report should be parsed.
If this lookup is not possible, and the feed name is not given as parameter, the feed cannot be parsed.
The field extra.file_name has the following structure: %Y-%m-%d-${report_name}[-suffix].csv where suffix can be something like country-geo. For example, some possible filenames are 2019-01-01-scan_http-country-geo.csv or 2019-01-01-scan_tftp.csv. The important part is ${report_name}, between the date and the suffix. Since version 2.1.2 the date in the filename is optional, so filenames like scan_tftp.csv are also detected.
Fixed feed name
If the method above is not possible and for upgraded instances, the feed can be set with the feedname parameter. Feed-names are derived from the subjects of the Shadowserver E-Mails. A list of possible feeds can be found in the table below in the column “feed name”.
Supported reports
These are the supported feed name and their corresponding file name for automatic detection:
feed name
file name
Accessible-ADB
scan_adb
Accessible-AFP
scan_afp
Accessible-AMQP
scan_amqp
Accessible-ARD
scan_ard
Accessible-Cisco-Smart-Install
cisco_smart_install
Accessible-CoAP
scan_coap
Accessible-CWMP
scan_cwmp
Accessible-MS-RDPEUDP
scan_msrdpeudp
Accessible-FTP
scan_ftp
Accessible-Hadoop
scan_hadoop
Accessible-HTTP
scan_http
Accessible-Radmin
scan_radmin
Accessible-RDP
scan_rdp
Accessible-Rsync
scan_rsync
Accessible-SMB
scan_smb
Accessible-Telnet
scan_telnet
Accessible-Ubiquiti-Discovery-Service
scan_ubiquiti
Accessible-VNC
scan_vnc
Blacklisted-IP (deprecated)
blacklist
Blocklist
blocklist
Compromised-Website
compromised_website
Device-Identification IPv4 / IPv6
device_id/device_id6
DNS-Open-Resolvers
scan_dns
Honeypot-Amplification-DDoS-Events
event4_honeypot_ddos_amp
Honeypot-Brute-Force-Events
event4_honeypot_brute_force
Honeypot-Darknet
event4_honeypot_darknet
Honeypot-HTTP-Scan
event4_honeypot_http_scan
HTTP-Scanners
hp_http_scan
ICS-Scanners
hp_ics_scan
IP-Spoofer-Events
event4_ip_spoofer
Microsoft-Sinkhole-Events IPv4
event4_microsoft_sinkhole
Microsoft-Sinkhole-Events-HTTP IPv4
event4_microsoft_sinkhole_http
NTP-Monitor
scan_ntpmonitor
NTP-Version
scan_ntp
Open-Chargen
scan_chargen
Open-DB2-Discovery-Service
scan_db2
Open-Elasticsearch
scan_elasticsearch
Open-IPMI
scan_ipmi
Open-IPP
scan_ipp
Open-LDAP
scan_ldap
Open-LDAP-TCP
scan_ldap_tcp
Open-mDNS
scan_mdns
Open-Memcached
scan_memcached
Open-MongoDB
scan_mongodb
Open-MQTT
scan_mqtt
Open-MSSQL
scan_mssql
Open-NATPMP
scan_nat_pmp
Open-NetBIOS-Nameservice
scan_netbios
Open-Netis
netis_router
Open-Portmapper
scan_portmapper
Open-QOTD
scan_qotd
Open-Redis
scan_redis
Open-SNMP
scan_snmp
Open-SSDP
scan_ssdp
Open-TFTP
scan_tftp
Open-XDMCP
scan_xdmcp
Outdated-DNSSEC-Key
outdated_dnssec_key
Outdated-DNSSEC-Key-IPv6
outdated_dnssec_key_v6
Sandbox-URL
cwsandbox_url
Sinkhole-DNS
sinkhole_dns
Sinkhole-Events
event4_sinkhole/event6_sinkhole
Sinkhole-Events IPv4
event4_sinkhole
Sinkhole-Events IPv6
event6_sinkhole
Sinkhole-HTTP-Events
event4_sinkhole_http/event6_sinkhole_http
Sinkhole-HTTP-Events IPv4
event4_sinkhole_http
Sinkhole-HTTP-Events IPv6
event6_sinkhole_http
Sinkhole-Events-HTTP-Referer
event4_sinkhole_http_referer/event6_sinkhole_http_referer
Sinkhole-Events-HTTP-Referer IPv4
event4_sinkhole_http_referer
Sinkhole-Events-HTTP-Referer IPv6
event6_sinkhole_http_referer
Spam-URL
spam_url
SSL-FREAK-Vulnerable-Servers
scan_ssl_freak
SSL-POODLE-Vulnerable-Servers
scan_ssl_poodle/scan6_ssl_poodle
Vulnerable-Exchange-Server *
scan_exchange
Vulnerable-ISAKMP
scan_isakmp
Vulnerable-HTTP
scan_http
Vulnerable-SMTP
scan_smtp_vulnerable
* This report can also contain data on active webshells (column tag is exchange;webshell), and are therefore not only vulnerable but also actively infected.
In addition, the following legacy reports are supported:
feed name
successor feed name
file name
Amplification-DDoS-Victim
Honeypot-Amplification-DDoS-Events
ddos_amplification
CAIDA-IP-Spoofer
IP-Spoofer-Events
caida_ip_spoofer
Darknet
Honeypot-Darknet
darknet
Drone
Sinkhole-Events
botnet_drone
Drone-Brute-Force
Honeypot-Brute-Force-Events, Sinkhole-HTTP-Events
drone_brute_force
Microsoft-Sinkhole
Sinkhole-HTTP-Events
microsoft_sinkhole
Sinkhole-HTTP-Drone
Sinkhole-HTTP-Events
sinkhole_http_drone
IPv6-Sinkhole-HTTP-Drone
Sinkhole-HTTP-Events
sinkhole6_http
More information on these legacy reports can be found in Changes in Sinkhole and Honeypot Report Types and Formats.
Development
Structure of this Parser Bot
- The parser consists of two files:
_config.py
parser.py
orparser_json.py
Both files are required for the parser to work properly.
Add new Feedformats
Add a new feed format and conversions if required to the file
_config.py
. Don’t forget to update the mapping
dict.
It is required to look up the correct configuration.
Look at the documentation in the bot’s _config.py
file for more information.
Shodan¶
Information
name: intelmq.bots.parsers.shodan.parser
public: yes
description: Parses data from Shodan (search, stream etc).
The parser is by far not complete as there are a lot of fields in a big nested structure. There is a minimal mode available which only parses the important/most useful fields and also saves everything in extra.shodan keeping the original structure. When not using the minimal mode if may be useful to ignore errors as many parsing errors can happen with the incomplete mapping.
Configuration Parameters
ignore_errors: Boolean (default true)
minimal_mode: Boolean (default false)
ZoneH¶
Information
name: intelmq.bots.parsers.zoneh.parser
public: yes
description: Parses data from ZoneH.
Description This bot is designed to consume defacement reports from zone-h.org. It expects fields normally present in CSV files distributed by email.
Expert Bots¶
Abusix¶
Information
name: intelmq.bots.experts.abusix.expert
lookup: dns
public: yes
cache (redis db): 5
description: RIPE abuse contacts resolving through DNS TXT queries
Configuration Parameters
Cache parameters (see in section Common parameters)
Requirements
This bot can optionally use the python module querycontacts by Abusix itself: https://pypi.org/project/querycontacts/
pip3 install querycontacts
If the package is not installed, our own routines are used.
Aggregate¶
Information
name: intelmq.bots.experts.aggregate.expert
lookup: no
public: yes
cache (redis db): 8
description: Aggregates events based upon given fields & timespan
Configuration Parameters
Cache parameters (see in section Common parameters)
TTL is not used, using it would result in data loss.
fields Given fields which are used to aggregate like classification.type, classification.identifier
threshold If the aggregated event is lower than the given threshold after the timespan, the event will get dropped.
timespan Timespan to aggregate events during the given time. I. e. 1 hour
Usage
Define specific fields to filter incoming events and aggregate them. Also set the timespan you want the events to get aggregated. Usage i. e. 1 hour
Note
The “cleanup” procedure, sends out the aggregated events or drops them based upon the given threshold value. It is called on every incoming message and on the bot’s initialization. If you’re potentially running on low traffic ( no incoming events within the given timestamp ) it is recommended to reload or restart the bot via cronjob each 30 minutes (adapt to your configured timespan). Otherwise you might loose information.
e.:
crontab -e
0,30 * * * * intelmqctl reload my-aggregate-bot
For reloading/restarting please check the intelmqctl documentation documentation.
ASN Lookup¶
Information
name: intelmq.bots.experts.asn_lookup.expert
lookup: local database
public: yes
cache (redis db): none
description: IP to ASN
Configuration Parameters
database: Path to the downloaded database.
Requirements
Install pyasn module
pip3 install pyasn
Database
Use this command to create/update the database and reload the bot:
intelmq.bots.experts.asn_lookup.expert --update-database
The database is fetched from routeviews.org and licensed under the Creative Commons Attribution 4.0 International license (see the routeviews FAQ).
CSV Converter¶
Information
name: intelmq.bots.experts.csv_converter.expert
lookup: no
public: yes
cache (redis db): none
description: Converts an event to CSV format, saved in the output field.
Configuration Parameters
delimiter: String, default “,”
fieldnames: Comma-separated list of field names, e.g. “time.source,classification.type,source.ip”
Usage
To use the CSV-converted data in an output bot - for example in a file output, use the configuration parameter single_key of the output bot and set it to output.
Cymru Whois¶
Information
name: intelmq.bots.experts.cymru_whois.expert
lookup: Cymru DNS
public: yes
cache (redis db): 5
description: IP to geolocation, ASN, BGP prefix
Public documentation: https://www.team-cymru.com/IP-ASN-mapping.html#dns
Configuration Parameters
Cache parameters (see in section Common parameters)
``: Overwrite existing fields. Default: True if not given (for backwards compatibility, will change in version 3.0.0)
RemoveAffix¶
Information
name: intelmq.bots.experts.remove_affix.expert
lookup: none
public: yes
cache (redis db): none
description: Cut string from string
Configuration Parameters
remove_prefix: True - cut from start, False - cut from end
affix: example ‘www.’
field: example field ‘source.fqdn’
Description Remove part of string from string, example: www. from domains.
Domain Suffix¶
This bots adds the public suffix to the event, derived by a domain. See or information on the public suffix list: https://publicsuffix.org/list/ Only rules for ICANN domains are processed. The list can (and should) contain Unicode data, punycode conversion is done during reading.
Note that the public suffix is not the same as the top level domain (TLD). E.g. co.uk is a public suffix, but the TLD is uk. Privately registered suffixes (such as blogspot.co.at) which are part of the public suffix list too, are ignored.
Information
name: intelmq.bots.experts.domain_suffix.expert
lookup: no
public: yes
cache (redis db): -
description: extracts the domain suffix from the FQDN
Configuration Parameters
field: either “fqdn” or “reverse_dns”
suffix_file: path to the suffix file
Rule processing
A short summary how the rules are processed:
The simple ones:
com
at
gv.at
example.com leads to com, example.gv.at leads to gv.at.
Wildcards:
*.example.com
www.example.com leads to www.example.com.
And additionally the exceptions, together with the above wildcard rule:
!www.example.com
www.example.com does now not lead to www.example.com, but to example.com.
Database
Use this command to create/update the database and reload the bot:
intelmq.bots.experts.domain_suffix.expert --update-database
Domain valid¶
Information
name: intelmq.bots.experts.domain_valid.expert
lookup: no
public: yes
cache (redis db): none
description: Checks if a domain is valid by performing multiple validity checks (see below).
Configuration Parameters
domain_field: The name of the field to be validated.
tlds_domains_list: local file with all valid TLDs, default location
/opt/intelmq/var/lib/bots/domain_valid/tlds-alpha-by-domain.txt
Description
If the field given in domain_field does not exist in the event, the event is dropped.
If the domain contains underscores (_
), the event is dropped.
If the domain is not valid according to the validators library, the event is dropped.
If the domain’s last part (the TLD) is not in the TLD-list configured by parameter tlds_domains_list
, the field is dropped.
Latest TLD list: https://data.iana.org/TLD/
Deduplicator¶
Information
name: intelmq.bots.experts.deduplicator.expert
lookup: redis cache
public: yes
cache (redis db): 6
description: Bot responsible for ignore duplicated messages. The bot can be configured to perform deduplication just looking to specific fields on the message.
Configuration Parameters
Cache parameters (see in section Common parameters)
bypass- true or false value to bypass the deduplicator. When set to true, messages will not be deduplicated. Default: false
Parameters for “fine-grained” deduplication
filter_type: type of the filtering which can be “blacklist” or “whitelist”. The filter type will be used to define how Deduplicator bot will interpret the parameter filter_keys in order to decide whether an event has already been seen or not, i.e., duplicated event or a completely new event.
“whitelist” configuration: only the keys listed in filter_keys will be considered to verify if an event is duplicated or not.
“blacklist” configuration: all keys except those in filter_keys will be considered to verify if an event is duplicated or not.
filter_keys: string with multiple keys separated by comma. Please note that time.observation key will not be considered even if defined, because the system always ignore that key.
When using a whitelist field pattern and a small number of fields (keys), it becomes more important, that these fields exist in the events themselves. If a field does not exist, but is part of the hashing/deduplication, this field will be ignored. If such events should not get deduplicated, you need to filter them out before the deduplication process, e.g. using a sieve expert. See also this discussion thread on the mailing-list.
Parameters Configuration Example
Example 1
The bot with this configuration will detect duplication only based on source.ip and destination.ip keys.
parameters:
redis_cache_db: 6
redis_cache_host: "127.0.0.1"
redis_cache_password: null
redis_cache_port: 6379
redis_cache_ttl: 86400
filter_type: "whitelist"
filter_keys: "source.ip,destination.ip"
Example 2
The bot with this configuration will detect duplication based on all keys, except source.ip and destination.ip keys.
parameters:
redis_cache_db: 6
redis_cache_host: "127.0.0.1"
redis_cache_password: null
redis_cache_port: 6379
redis_cache_ttl: 86400
filter_type: "blacklist"
filter_keys: "source.ip,destination.ip"
Flushing the cache
To flush the deduplicator’s cache, you can use the redis-cli tool. Enter the database used by the bot and submit the flushdb command:
redis-cli -n 6
flushdb
DO Portal Expert Bot¶
Information
name: intelmq.bots.experts.do_portal.expert
lookup: yes
public: no
cache (redis db): none
description: The DO portal retrieves the contact information from a DO portal instance: http://github.com/certat/do-portal/
Configuration Parameters
mode - Either replace or append the new abuse contacts in case there are existing ones.
portal_url - The URL to the portal, without the API-path. The used URL is $portal_url + ‘/api/1.0/ripe/contact?cidr=%s’.
portal_api_key - The API key of the user to be used. Must have sufficient privileges.
Field Reducer Bot¶
Information
name: intelmq.bots.experts.field_reducer.expert
lookup: none
public: yes
cache (redis db): none
description: The field reducer bot is capable of removing fields from events.
Configuration Parameters
type - either “whitelist” or “blacklist”
keys - Can be a JSON-list of field names ([“raw”, “source.account”]) or a string with a comma-separated list of field names (“raw,source.account”).
Whitelist
Only the fields in keys will passed along.
Blacklist
The fields in keys will be removed from events.
Filter¶
The filter bot is capable of filtering specific events.
Information
name: intelmq.bots.experts.filter.expert
lookup: none
public: yes
cache (redis db): none
description: A simple filter for messages (drop or pass) based on a exact string comparison or regular expression
Configuration Parameters
Parameters for filtering with key/value attributes
filter_key
- key from data formatfilter_value
- value for the keyfilter_action
- action when a message match to the criteria (possible actions: keep/drop)filter_regex
- attribute determines if thefilter_value
shall be treated as regular expression or not.If this attribute is not empty (can be
true
,yes
or whatever), the bot uses python’s`re.search
<https://docs.python.org/3/library/re.html#re.search>`_ function to evaluate the filter with regular expressions. If this attribute is empty or evaluates to false, an exact string comparison is performed. A check on string inequality can be achieved with the usage of Paths described below.
Parameters for time based filtering
not_before - events before this time will be dropped
not_after - events after this time will be dropped
Both parameters accept string values describing absolute or relative time:
absolute
basically anything parseable by datetime parser, eg. “2015-09-012T06:22:11+00:00”
time.source taken from the event will be compared to this value to decide the filter behavior
relative
accepted string formatted like this “<integer> <epoch>”, where epoch could be any of following strings (could optionally end with trailing ‘s’): hour, day, week, month, year
time.source taken from the event will be compared to the value (now - relative) to decide the filter behavior
Examples of time filter definition
`"not_before" : "2015-09-012T06:22:11+00:00"`
events older than the specified time will be dropped`"not_after" : "6 months"`
just events older than 6 months will be passed through the pipeline
Possible paths
_default: default path, according to the configuration
action_other: Negation of the default path
filter_match: For all events the filter matched on
filter_no_match: For all events the filter does not match
action
match
_default
action_other
filter_match
filter_no_match
keep
✓
✓
✗
✓
✗
keep
✗
✗
✓
✗
✓
drop
✓
✗
✓
✓
✗
drop
✗
✓
✗
✗
✓
In DEBUG logging level, one can see that the message is sent to both matching paths, also if one of the paths is not configured. Of course the message is only delivered to the configured paths.
Format Field¶
Information
name: intelmq.bots.experts.format_field.expert
lookup: none
cache (redis db): none
description: String method operations on column values
Configuration Parameters
Parameters for stripping chars
strip_columns - A list of strings or a string of comma-separated values with field names. The names must match the IntelMQ Data Format field names. E.g.
"columns": [ "malware.name", "extra.tags" ],
is equivalent to:
"columns": "malware.name,extra.tags"
strip_chars - a set of characters to remove as leading/trailing characters(default: space)
Parameters for replacing chars
replace_column - key from data format
old_value - the string to search for
new_value - the string to replace the old value with
replace_count - number specifying how many occurrences of the old value you want to replace(default: 1)
Parameters for splitting string to list of string
split_column - key from data format
split_separator - specifies the separator to use when splitting the string(default: ,)
Order of operation: strip -> replace -> split. These three methods can be combined such as first strip and then split.
Generic DB Lookup¶
This bot is capable for enriching intelmq events by lookups to a database. Currently only PostgreSQL and SQLite are supported.
If more than one result is returned, a ValueError is raised.
Information
name: intelmq.bots.experts.generic_db_lookup.expert
lookup: database
public: yes
cache (redis db): none
description: This bot is capable for enriching intelmq events by lookups to a database.
Configuration Parameters
Connection
engine: postgresql or sqlite
database: string, defaults to “intelmq”, database name or the SQLite filename
table: defaults to “contacts”
PostgreSQL specific
host: string, defaults to “localhost”
password: string
port: integer, defaults to 5432
sslmode: string, defaults to “require”
user: defaults to “intelmq”
Lookup
match_fields: defaults to {“source.asn”: “asn”}
The value is a key-value mapping an arbitrary number intelmq field names to table column names. The values are compared with = only.
Replace fields
overwrite: defaults to false. Is applied per field
replace_fields: defaults to {“contact”: “source.abuse_contact”}
replace_fields is again a key-value mapping an arbitrary number of table column names to intelmq field names
Gethostbyname¶
Information
name: intelmq.bots.experts.gethostbyname.expert
lookup: DNS
public: yes
cache (redis db): none
description: DNS name (FQDN) to IP
Configuration Parameters
fallback_to_url If True and no source.fqdn present, use source.url instead while producing source.ip
gaierrors_to_ignore: Optional, list (comma-separated) of gaierror codes to ignore, e.g. -3 for EAI_AGAIN (Temporary failure in name resolution). Only accepts the integer values, not the names.
overwrite: Boolean. If true, overwrite existing IP addresses. Default: False.
Description
Resolves the source/destination.fqdn hostname using the gethostbyname syscall and saves the resulting IP address as source/destination.ip. The following gaierror resolution errors are ignored and treated as if the hostname cannot be resolved:
-2/EAI_NONAME: NAME or SERVICE is unknown
-4/EAI_FAIL: Non-recoverable failure in name res.
-5/EAI_NODATA: No address associated with NAME.
-8/EAI_SERVICE: SERVICE not supported for `ai_socktype’.
-11/EAI_SYSTEM: System error returned in `errno’.
Other errors result in an exception if not ignored by the parameter gaierrors_to_ignore (see above). All gaierrors can be found here: http://www.castaglia.org/proftpd/doc/devel-guide/src/lib/glibc-gai_strerror.c.html
HTTP Status¶
Fetches the HTTP Status for a given URI
Information
name: intelmq.bots.experts.http.expert_status
description: The bot fetches the HTTP status for a given URL and saves it in the event.
Configuration Parameters
field: The name of the field containing the URL to be checked (required).
success_status_codes: A list of success status codes. If this parameter is omitted or the list is empty, successful status codes are the ones between 200 and 400.
overwrite: Specifies if an existing ‘status’ value should be overwritten.
HTTP Content¶
Fetches an HTTP resource and checks if it contains a specific string.
Information
name: intelmq.bots.experts.http.expert_content
description: The bot fetches an HTTP resource and checks if it contains a specific string.
Configuration Parameters
field: The name of the field containing the URL to be checked (defaults to source.url)
needle: The string that the content available on URL is checked for
overwrite: A boolean value that specifies if an existing ‘status’ value should be overwritten.
IDEA Converter¶
Converts the event to IDEA format and saves it as JSON in the field output. All other fields are not modified.
Documentation about IDEA: https://idea.cesnet.cz/en/index
Information
name: intelmq.bots.experts.idea.expert
lookup: no
public: yes
cache (redis db): none
description: The bot does a best effort translation of events into the IDEA format.
Configuration Parameters
test_mode: add Test category to mark all outgoing IDEA events as informal (meant to simplify setting up and debugging new IDEA producers) (default: true)
Jinja2 Template Expert¶
This bot lets you modify the content of your IntelMQ message fields using Jinja2 templates.
Documentation about Jinja2 templating language: https://jinja.palletsprojects.com/
Information
name: intelmq.bots.experts.jinja.expert
description: Modify the content of IntelMQ messages using jinja2 templates
Configuration Parameters
fields: a dict containing as key the name of the field where the result of the Jinja2 template should be written to and as value either a Jinja2 template or a filepath to a Jinja2 template file (starting with
file:///
). Because the experts decides if it is a filepath based on the value starting withfile:///
it is not possible to simply write values starting withfile:///
to fields. The object containing the existing message will be passed to the Jinja2 template with the namemsg
.fields: output: The provider is {{ msg['feed.provider'] }}! feed.url: "{{ msg['feed.url'] | upper }}" extra.somejinjaoutput: file:///etc/intelmq/somejinjatemplate.j2
Lookyloo¶
Lookyloo is a website screenshotting and analysis tool. For more information and installation instructions visit https://www.lookyloo.eu/
The bot sends a request for source.url to the configured Lookyloo instance and saves the retrieved website screenshot link in the field screenshot_url. Lookyloo only queues the website for screenshotting, therefore the screenshot may not be directly ready after the bot requested it. The pylookyloo library is required for this bot. The http_user_agent parameter is passed on, but not other HTTP-related parameter like proxies.
Events without source.url are ignored.
Information
name: intelmq.bots.experts.lookyloo.expert
description: LookyLoo expert bot for automated website screenshots
Configuration Parameters
instance_url: LookyLoo instance to connect to
MaxMind GeoIP¶
Information
name: intelmq.bots.experts.maxmind_geoip.expert
lookup: local database
public: yes
cache (redis db): none
description: IP to geolocation
Setup
The bot requires the MaxMind’s geoip2 Python library, version 2.2.0 has been tested.
To download the database a free license key is required. More information can be found at https://blog.maxmind.com/2019/12/18/significant-changes-to-accessing-and-using-geolite2-databases/
Configuration Parameters
database: Path to the local database, e.g. “/opt/intelmq/var/lib/bots/maxmind_geoip/GeoLite2-City.mmdb”
overwrite: boolean
use_registered: boolean. MaxMind has two country ISO codes: One for the physical location of the address and one for the registered location. Default is false (backwards-compatibility). See also https://github.com/certtools/intelmq/pull/1344 for a short explanation.
license_key: License key is necessary for downloading the GeoLite2 database.
Database
Use this command to create/update the database and reload the bot:
intelmq.bots.experts.maxmind_geoip.expert --update-database
MISP¶
Queries a MISP instance for the source.ip and adds the MISP Attribute UUID and MISP Event ID of the newest attribute found.
Information
name: intelmq.bots.experts.misp.expert
lookup: yes
public: no
cache (redis db): none
description: IP address to MISP attribute and event
Configuration Parameters
misp_key: MISP Authkey
misp_url: URL of MISP server (with trailing ‘/’)
Generic parameters used in this bot:
http_verify_cert: Verify the TLS certificate of the server, boolean (default: true)
McAfee Active Response lookup¶
Information
name: intelmq.bots.experts.mcafee.expert_mar
lookup: yes
public: no
cache (redis db): none
description: Queries DXL bus for hashes, IP addresses or FQDNs.
Configuration Parameters
dxl_config_file: location of file containing required information to connect to DXL bus
lookup_type: One of: - Hash: looks up malware.hash.md5, malware.hash.sha1 and malware.hash.sha256 - DestSocket: looks up destination.ip and destination.port - DestIP: looks up destination.ip - DestFQDN: looks up in destination.fqdn
Modify¶
Information
name: intelmq.bots.experts.modify.expert
lookup: local config
public: yes
cache (redis db): none
description: modify expert bot allows you to change arbitrary field values of events just using a configuration file
Configuration Parameters
configuration_path: filename
case_sensitive: boolean, default: true
maximum_matches: Maximum number of matches. Processing stops after the limit is reached. Default: no limit (null, 0).
overwrite: Overwrite any existing fields by matching rules. Default if the parameter is given: true, for backwards compatibility. Default will change to false in version 3.0.0.
Configuration File
The modify expert bot allows you to change arbitrary field values of events just using a configuration file. Thus it is possible to adapt certain values or adding new ones only by changing JSON-files without touching the code of many other bots.
The configuration is called modify.conf and looks like this:
[
{
"rulename": "Standard Protocols http",
"if": {
"source.port": "^(80|443)$"
},
"then": {
"protocol.application": "http"
}
},
{
"rulename": "Spamhaus Cert conficker",
"if": {
"malware.name": "^conficker(ab)?$"
},
"then": {
"classification.identifier": "conficker"
}
},
{
"rulename": "bitdefender",
"if": {
"malware.name": "bitdefender-(.*)$"
},
"then": {
"malware.name": "{matches[malware.name][1]}"
}
},
{
"rulename": "urlzone",
"if": {
"malware.name": "^urlzone2?$"
},
"then": {
"classification.identifier": "urlzone"
}
},
{
"rulename": "default",
"if": {
"feed.name": "^Spamhaus Cert$"
},
"then": {
"classification.identifier": "{msg[malware.name]}"
}
}
]
In our example above we have five groups labeled Standard Protocols http, Spamhaus Cert conficker, bitdefender, urlzone and default. All sections will be considered, in the given order (from top to bottom).
Each rule consists of conditions and actions. Conditions and actions are dictionaries holding the field names of events and regular expressions to match values (selection) or set values (action). All matching rules will be applied in the given order. The actions are only performed if all selections apply.
If the value for a condition is an empty string, the bot checks if the field does not exist. This is useful to apply default values for empty fields.
Actions
You can set the value of the field to a string literal or number.
In addition you can use the standard Python string format syntax to access the values from the processed event as msg and the match groups of the conditions as matches, see the bitdefender example above. Group 0 ([0]) contains the full matching string. See also the documentation on re.Match.group.
Note that matches will also contain the match groups from the default conditions if there were any.
Examples
We have an event with feed.name = Spamhaus Cert and malware.name = confickerab. The expert loops over all sections in the file and eventually enters section Spamhaus Cert. First, the default condition is checked, it matches! OK, going on. Otherwise the expert would have selected a different section that has not yet been considered. Now, go through the rules, until we hit the rule conficker. We combine the conditions of this rule with the default conditions, and both rules match! So we can apply the action: classification.identifier is set to conficker, the trivial name.
Assume we have an event with feed.name = Spamhaus Cert and malware.name = feodo. The default condition matches, but no others. So the default action is applied. The value for classification.identifier will be set to feodo by {msg[malware.name]}.
Types
If the rule is a string, a regular expression search is performed, also for numeric values (str() is called on them). If the rule is numeric for numeric values, a simple comparison is done. If other types are mixed, a warning will be thrown.
For boolean values, the comparison value needs to be true or false as in JSON they are written all-lowercase.
National CERT contact lookup by CERT.AT¶
Information
name: intelmq.bots.experts.national_cert_contact_certat.expert
lookup: https
public: yes
cache (redis db): none
description: https://contacts.cert.at offers an IP address to national CERT contact (and cc) mapping. See https://contacts.cert.at for more info.
Configuration Parameters
filter: (true/false) act as a filter for AT.
overwrite_cc: set to true if you want to overwrite any potentially existing cc fields in the event.
RDAP¶
Information
name: intelmq.bots.experts.rdap.expert
lookup: http/https
public: yes/no
cache (redis db): 5
description: Asks rdap servers for a given domain.
Configuration Parameters
rdap_order
: a list of strings, default['abuse', 'technical']
. Search order of contacts with these roles.rdap_bootstrapped_servers
: Customized RDAP servers. Do not forget the trailing slash. For example:
{
"at": {
"url": "rdap.server.at/v1/,
"auth": {
"type": "jwt",
"token": "ey..."
}
},
"de": "rdap.service:1337/v1/"
}
RecordedFuture IP risk¶
This Bot tags events with score found in recorded futures large IP risklist.
Information
name: intelmq.bots.experts.recordedfuture_iprisk.expert
lookup: local database
public: no
cache (redis db): none
description: Record risk score associated to source and destination IP if they are present. Assigns 0 to IP addresses not in the RF list.
Configuration Parameters
database: Location of csv file obtained from recorded future API (a script is provided to download the large IP set)
overwrite: set to true if you want to overwrite any potentially existing risk score fields in the event.
api_token: This needs to contain valid API token to download the latest database data.
Description
For both source.ip and destination.ip the corresponding risk score is fetched from a local database created from Recorded Future’s API. The score is recorded in extra.rf_iprisk.source and extra.rf_iprisk.destination. If a lookup for an IP fails a score of 0 is recorded.
See https://www.recordedfuture.com/products/api/ and speak with your recorded future representative for more information.
The list is obtained from recorded future API and needs a valid API TOKEN The large list contains all IP’s with a risk score of 25 or more. If IP’s are not present in the database a risk score of 0 is given
A script is supplied that may be run as intelmq to update the database.
Database
Use this command to create/update the database and reload the bot:
intelmq.bots.experts.recordedfuture_iprisk.expert --update-database
Reverse DNS¶
For both source.ip and destination.ip the PTR record is fetched and the first valid result is used for source.reverse_dns/destination.reverse_dns.
Information
name: intelmq.bots.experts.reverse_dns.expert
lookup: DNS
public: yes
cache (redis db): 8
description: IP to domain
Configuration Parameters
Cache parameters (see in section Common parameters)
cache_ttl_invalid_response: The TTL for cached invalid responses.
overwrite: Overwrite existing fields. Default: True if not given (for backwards compatibility, will change in version 3.0.0)
RFC1918¶
Several RFCs define ASNs, IP Addresses and Hostnames (and TLDs) reserved for documentation. Events or fields of events can be dropped if they match the criteria of either being reserved for documentation (e.g. AS 64496, Domain example.com) or belonging to a local area network (e.g. 192.168.0.0/24). These checks can applied to URLs, IP Addresses, FQDNs and ASNs.
It is configurable if the whole event should be dropped (“policies”) or just the field removed, as well as which fields should be checked.
Sources:
Information
name: intelmq.bots.experts.rfc1918.expert
lookup: none
public: yes
cache (redis db): none
description: removes events or single fields with invalid data
Configuration Parameters
fields: string, comma-separated list of fields e.g. destination.ip,source.asn,source.url. Supported fields are:
destination.asn & source.asn
destination.fqdn & source.fqdn
destination.ip & source.ip
destination.url & source.url
policy: string, comma-separated list of policies, e.g. del,drop,drop. drop will cause that the the entire event to be removed if the field is , del causes the field to be removed.
With the example parameter values given above, this means that:
If a destination.ip value is part of a reserved network block, the field will be removed (policy “del”).
If a source.asn value is in the range of reserved AS numbers, the event will be removed altogether (policy “drop).
If a source.url value contains a host with either an IP address part of a reserved network block, or a reserved domain name (or with a reserved TLD), the event will be dropped (policy “drop”)
RIPE¶
Online RIPE Abuse Contact and Geolocation Finder for IP addresses and Autonomous Systems.
Information
name: intelmq.bots.experts.ripe.expert
lookup: HTTPS API
public: yes
cache (redis db): 10
description: IP to abuse contact
Configuration Parameters
Cache parameters (see section Common parameters)
mode: either append (default) or replace
query_ripe_db_asn: Query for IPs at http://rest.db.ripe.net/abuse-contact/%s.json, default true
query_ripe_db_ip: Query for ASNs at http://rest.db.ripe.net/abuse-contact/as%s.json, default true
query_ripe_stat_asn: Query for ASNs at https://stat.ripe.net/data/abuse-contact-finder/data.json?resource=%s, default true
query_ripe_stat_ip: Query for IPs at https://stat.ripe.net/data/abuse-contact-finder/data.json?resource=%s, default true
query_ripe_stat_geolocation: Query for IPs at https://stat.ripe.net/data/maxmind-geo-lite/data.json?resource=%s, default true
Sieve¶
Information
name: intelmq.bots.experts.sieve.expert
lookup: none
public: yes
cache (redis db): none
description: Filtering with a sieve-based configuration language
Configuration Parameters
file: Path to sieve file. Syntax can be validated with intelmq_sieve_expert_validator.
Description
The sieve bot is used to filter and/or modify events based on a set of rules. The rules are specified in an external configuration file and with a syntax similar to the Sieve language used for mail filtering.
Each rule defines a set of matching conditions on received events. Events can be
matched based on keys and values in the event. Conditions can be combined using
parenthesis and the boolean operators &&
and ||
. If the processed event
matches a rule’s conditions, the corresponding actions are performed. Actions
can specify whether the event should be kept or dropped in the pipeline
(filtering actions) or if keys and values should be changed (modification
actions).
Requirements
To use this bot, you need to install the required dependencies:
pip3 install -r intelmq/bots/experts/sieve/REQUIREMENTS.txt
Examples
The following excerpts illustrate some of the basic features of the sieve file format:
if :exists source.fqdn {
keep // aborts processing of subsequent rules and forwards the event.
}
if :notexists source.abuse_contact || source.abuse_contact =~ '.*@example.com' {
drop // aborts processing of subsequent rules and drops the event.
}
if source.ip << '192.0.0.0/24' {
add! comment = 'bogon' // sets the field comment to this value and overwrites existing values
path 'other-path' // the message is sent to the given path
}
if classification.type :in ['phishing', 'malware-distribution'] && source.fqdn =~ '.*\.(ch|li)$' {
add! comment = 'domainabuse'
keep
} elif classification.type == 'scanner' {
add! comment = 'ignore'
drop
} else {
remove comment
}
Reference
Sieve File Structure
The sieve file contains an arbitrary number of rules of the form:
if EXPRESSION {
ACTIONS
} elif EXPRESSION {
ACTIONS
} else {
ACTIONS
}
Nested if-statements and mixed if statements and rules in the same scope are possible.
Expressions
Each rule specifies on or more expressions to match an event based on its keys
and values. Event keys are specified as strings without quotes. String values
must be enclosed in single quotes. Numeric values can be specified as integers
or floats and are unquoted. IP addresses and network ranges (IPv4 and IPv6) are
specified with quotes. List values for use with list/set operators are specified
as string, float, int, bool and string literals separated by commas and enclosed
in square brackets.
Expression statements can be combined and chained using
parentheses and the boolean operators &&
and ||
.
The following operators may be used to match events:
:exists and :notexists match if a given key exists, for example:
if :exists source.fqdn { ... }
== and != match for equality of strings, numbers, and booleans, for example:
if feed.name != 'acme-security' || feed.accuracy == 100 || extra.false_positive == false { ... }
:contains matches on substrings.
=~ matches strings based on the given regular expression. !~ is the inverse regular expression match.
Numerical comparisons are evaluated with <, <=, >, >=.
<< matches if an IP address is contained in the specified network range:
if source.ip << '10.0.0.0/8' { ... }
String values to match against can also be specified as lists of strings, which have separate operators. For example:
if source.ip :in ['8.8.8.8', '8.8.4.4'] { ... }
In this case, the event will match if it contains a key source.ip with either value 8.8.8.8 or 8.8.4.4.
There are also :containsany to match at least one of a list of substrings, and :regexin to match at least one of a list of regular expressions, similar to the :contains and =~ operators.
Lists of numeric values support :in to check for inclusion in a list of numbers:
if source.port :in [80, 443] { ... }
:equals tests for equality between lists, including order. Example for checking a hostname-port pair:
if extra.host_tuple :equals ['dns.google', 53] { ... }
:setequals tests for set-based equality (ignoring duplicates and value order) between a list of given values. Example for checking for the first nameserver of two domains, regardless of the order they are given in the list:
if extra.hostnames :setequals ['ns1.example.com', 'ns1.example.mx'] { ... }
:overlaps tests if there is at least one element in common between the list specified by a key and a list of values. Example for checking if at least one of the ICS, database or vulnerable tags is given: ``if extra.tags :overlaps [‘ics’, ‘database’, ‘vulnerable’] { … } ``
:subsetof tests if the list of values from the given key only contains values from a set of values specified as the argument. Example for checking for a host that has only ns1.example.com and/or ns2.[…] as its apparent hostname:
if extra.hostnames :subsetof ['ns1.example.com', 'ns2.example.com'] { ... }
:supersetof tests if the list of values from the given key is a superset of the values specified as the argument. Example for matching hosts with at least the IoT and vulnerable tags:
if extra.tags :supersetof ['iot', 'vulnerable'] { ... }
Boolean values can be matched with == or != followed by true or false. Example:
if extra.has_known_vulns == true { ... }
The combination of multiple expressions can be done using parenthesis and boolean operators:
if (source.ip == '127.0.0.1') && (comment == 'add field' || classification.taxonomy == 'vulnerable') { ... }
Any single expression or a parenthesised group of expressions can be negated using !:
if ! source.ip :contains '127.0.0.' || ! ( source.ip == '172.16.0.5' && source.port == 25 ) { ... }
Note: Since 3.0.0, list-based operators are used on list values, such as foo :in [1, 2, 3] instead of foo == [1, 2, 3] and foo :regexin [‘.mx’, ‘.zz’] rather than foo =~ [‘.mx’, ‘.zz’], and similarly for :containsany vs :contains. Besides that, ``:notcontains` has been removed, with e.g foo :notcontains [‘.mx’, ‘.zz’] now being represented using negation as ! foo :contains [‘.mx’, ‘.zz’].
Actions
If part of a rule matches the given conditions, the actions enclosed in { and } are applied. By default, all events that are matched or not matched by rules in the sieve file will be forwarded to the next bot in the pipeline, unless the drop action is applied.
add adds a key value pair to the event. It can be a string, number, or boolean. This action only applies if the key is not yet defined in the event. If the key is already defined, the action is ignored. Example:
add comment = 'hello, world'
Some basic mathematical expressions are possible, but currently support only relative time specifications objects are supported. For example:
`add time.observation += '1 hour'`
`add time.observation -= '10 hours'`
add! same as above, but will force overwrite the key in the event.
update modifies an existing value for a key. Only applies if the key is already defined. If the key is not defined in the event, this action is ignored. This supports mathematical expressions like above. Example:
update feed.accuracy = 50
Some basic mathematical expressions are possible, but currently support only relative time specifications objects are supported. For example:
`update time.observation += '1 hour'`
`update time.observation -= '10 hours'`
remove removes a key/value from the event. Action is ignored if the key is not defined in the event. Example:
remove extra.comments
keep sends the message to the next bot in the pipeline (same as the default behaviour), and stops sieve file processing.
keep
path sets the path (named queue) the message should be sent to (implicitly or with the command keep. The named queue needs to configured in the pipeline, see the User Guide for more information.
path 'named-queue'
You can as well set multiple destination paths with the same syntax as for value lists:
path ['one', 'two']
This will result in two identical message, one sent to the path one and the other sent to the path two.
If the path is not configured, the error looks like:
drop marks the event to be dropped. The event will not be forwarded to the next bot in the pipeline. The sieve file processing is interrupted upon reaching this action. No other actions may be specified besides the drop action within { and }.
Comments
Comments may be used in the sieve file: all characters after // and until the end of the line will be ignored.
Validating a sieve file
Use the following command to validate your sieve files:
$ intelmq.bots.experts.sieve.validator
usage: intelmq.bots.experts.sieve.validator [-h] sievefile
Validates the syntax of sievebot files.
positional arguments:
sievefile Sieve file
optional arguments:
-h, --help show this help message and exit
Splunk saved search¶
Information
name: intelmq.bots.experts.splunk_saved_search.expert
lookup: splunk database
public: no
cache (redis db): none
description: Enrich an event from Splunk search results.
Configuration Parameters
HTTP parameters (see above)
auth_token: String, Splunk API authentication token
url: String, base URL of the Splunk REST API
retry_interval: Integer, optional, default 5, number of seconds to wait between polling for search results to be available
saved_search: String, name of Splunk saved search to run
search_parameters: Array of string->string, optional, default
{}
, IntelMQ event fields containing the data to search for, mapped to parameters of the Splunk saved search. Example:"search_parameters": { "source.ip": "ip" }
result_fields: Array of string->string, optional, default
{}
, Splunk search result fields mapped to IntelMQ event fields to store the results in. Example:"result_fields": { "username": "source.account" }
not_found: List of strings, default
[ "warn", "send" ]
, what to do if the search returns zero results. All specified actions are performed. Valid values are:warn: log a warning message
send: send the event on unmodified
drop: drop the message
send and drop are mutually exclusive
multiple_result_handling: List of strings, default
[ "warn", "use_first", "send" ]
, what to do if the search returns more than one result. All specified actions are performed. Valid values are:limit: limit the search so that duplicates are impossible
warn: log a warning message
use_first: use the first search result
ignore: do not modify the event
send: send the event on
drop: drop the message
limit cannot be combined with any other value
send and drop are mutually exclusive
ignore and use_first are mutually exclusive
overwrite: Boolean or null, optional, default null, whether search results overwrite values already in the message or not. If null, attempting to add a field that already exists throws an exception.
Description
Runs a saved search in Splunk using fields in an event, adding fields from the search result into the event.
Splunk documentation on saved searches: https://docs.splunk.com/Documentation/Splunk/latest/Report/Createandeditreports
The saved search should take parameters according to the search_parameters configuration and deliver results according to result_fields. The examples above match a saved search of this format:
index="dhcp" ipv4address="$ip$" | ... | fields _time username ether
The time window used is the one saved with the search.
Waits for Splunk to return an answer for each message, so slow searches will delay the entire botnet. If you anticipate a load of more than one search every few seconds, consider running multiple load-balanced copies of this bot.
Taxonomy¶
Information
name: intelmq.bots.experts.taxonomy.expert
lookup: no
public: yes
cache (redis db): none
description: Adds the classification.taxonomy field according to the RSIT taxonomy.
Please note that there is a slight mismatch of IntelMQ’s taxonomy to the upstream taxonomy, but it should not matter here much.
Configuration Parameters
None.
Description
Information on the “Reference Security Incident Taxonomy” can be found here: https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force
For brevity, “type” means classification.type and “taxonomy” means classification.taxonomy.
If taxonomy is missing, and type is given, the according taxonomy is set.
If neither taxonomy, not type is given, taxonomy is set to “other” and type to “unknown”.
If taxonomy is given, but type is not, type is set to “unknown”.
Threshold¶
Information
name: intelmq.bots.experts.threshold.expert
lookup: redis cache
public: no
cache (redis db): 11
description: Check if the number of similar messages during a specified time interval exceeds a set value.
Configuration Parameters
Cache parameters (see section Common parameters), especially
redis_cache_ttl
as number of seconds before threshold counter is reset. Since version 3.1 (until 3.1 timeout was used).filter_keys: String, comma-separated list of field names to consider or ignore when determining which messages are similar.
filter_type: String, whitelist (consider only the fields in filter_keys) or blacklist (consider everything but the fields in filter_keys).
threshold: Integer, number of messages required before propagating one. In forwarded messages, the threshold is saved in the message as extra.count.
add_keys: Array of string->string, optional, fields and values to add (or update) to propagated messages. Example:
"add_keys": { "classification.type": "spam", "comment": "Started more than 10 SMTP connections" }
Limitations
This bot has certain limitations and is not a true threshold filter (yet). It works like this:
Every incoming message is hashed according to the filter_* parameters.
The hash is looked up in the cache and the count is incremented by 1, and the TTL of the key is (re-)set to the timeout.
If the new count matches the threshold exactly, the message is forwarded. Otherwise it is dropped.
Please note: Even if a message is sent, any further identical messages are dropped, if the time difference to the last message is less than the timeout! The counter is not reset if the threshold is reached.
Tor Nodes¶
Information
name: intelmq.bots.experts.tor_nodes.expert
lookup: local database
public: yes
cache (redis db): none
description: check if IP is tor node
Configuration Parameters
database: Path to the database
Database
Use this command to create/update the database and reload the bot:
intelmq.bots.experts.tor_nodes.expert --update-database
Trusted Introducer Lookup Expert¶
Information
name: intelmq.bots.experts.trusted_introducer_lookup.expert
lookup: internet
public: yes
cache (redis db): none
description: Lookups data from trusted introducer public teams list.
Configuration Parameters
order: Possible values are ‘domain’, ‘asn’. You can set multiple values, so first match wins.
If ‘domain’ is set, it will lookup the source.fqdn field. It will go from high-order to low-order, i.e. 1337.super.example.com -> super.example.com -> example.com -> .com
If ‘asn’ is set, it will lookup source.asn.
After a match, the abuse contact will be fetched from the trusted introducer teams list and will be stored in the event as source.abuse_contact. If there is no match, the event will not be enriched and will be sent to the next configured step.
Tuency¶
Information
name: intelmq.bots.experts.tuency.expert
lookup: yes
public: no
cache (redis db): none
description: Queries the IntelMQ API of a Tuency Contact Database instance.
Configuration Parameters
url: Tuency instance URL. Without the API path.
authentication_token: The Bearer authentication token. Without the
Bearer
prefix.overwrite: Boolean, if existing data in
source.abuse_contact
should be overwritten. Default: true
Description
tuency is a contact management database addressing the needs of CERTs.
Users of tuency can configure contact addresses and delivery settings for IP objects (addresses, netblocks), Autonomous Systems, and (sub-)domains.
This expert queries the information for source.ip
and source.fqdn
using the following other fields:
classification.taxonomy
classification.type
feed.provider
feed.name
These fields therefore need to exist, otherwise the message is skipped.
The API parameter “feed_status” is currently set to “production” constantly, until IntelMQ supports this field.
The API answer is processed as following. For the notification interval:
If suppress is true, then
extra.notify
is set to false.Otherwise:
If the interval is immediate, then
extra.ttl
is set to 0.Otherwise the interval is converted into seconds and saved in
extra.ttl
.
For the contact lookup:
For both fields ip and domain, the destinations objects are iterated and its email fields concatenated to a comma-separated list in source.abuse_contact
.
The IntelMQ fields used by this bot may change in the next IntelMQ release, as soon as better suited fields are available.
Truncate By Delimiter¶
Information
name: intelmq.bots.experts.truncate_by_delimiter.expert
lookup: no
public: yes
cache (redis db): none
description: Cut string if length is bigger than maximum length
Configuration Parameters
delimiter: The delimiter to be used for truncating, for example
.
or;
max_length: The maximum string length.
field: The field to be truncated, e.g.
source.fqdn
The given field is truncated step-by-step using the delimiter from the beginning, until the field is shorter than max_length.
Example: Cut through a long domain with a dot. The string is truncated until the domain does not exceed the configured maximum length.
input domain (e.g.
source.fqdn
):www.subdomain.web.secondsubomain.test.domain.com
delimiter:
.
max_length
: 20Resulting value
test.domain.com
(length: 15 characters)
URL¶
This bot extracts additional information from source.url and destination.url fields. It can fill the following fields:
source.fqdn
source.ip
source.port
source.urlpath
source.account
destination.fqdn
destination.ip
destination.port
destination.urlpath
destination.account
protocol.application
protocol.transport
Information
name: intelmq.bots.experts.url.expert
lookup: none
public: yes
cache (redis db): none
description: extract additional information from the URL
Configuration Parameters
overwrite: boolean, replace existing fields?
skip_fields: list of fields to not extract from the URL
Url2FQDN¶
This bot is deprecated and will be removed in version 4.0. Use ‘URL Expert’ bot instead.
This bot extracts the Host from the source.url and destination.url fields and writes it to source.fqdn or destination.fqdn if it is a hostname, or source.ip or destination.ip if it is an IP address.
Information
name: intelmq.bots.experts.url2fqdn.expert
lookup: none
public: yes
cache (redis db): none
description: writes domain name from URL to FQDN or IP address
Configuration Parameters
overwrite: boolean, replace existing FQDN / IP address?
uWhoisd¶
uWhoisd is a universal Whois server that supports caching and stores whois entries for historical purposes.
The bot sends a request for source.url, source.fqdn, source.ip or source.asn to the configured uWhoisd instance and saves the retrieved whois entry:
If both source.url and source.fqdn are present, it will only do a request for source.fqdn, as the hostname of source.url should be the same as source.fqdn. The whois entry will be saved in extra.whois.fqdn.
If source.ip is present, the whois entry will be saved in extra.whois.ip
If source.asn is present, he whois entry will be saved in extra.whois.asn
Events without source.url, source.fqdn, source.ip, or source.asn, are ignored.
Note: requesting a whois entry for a fully qualified domain name (FQDN) only works if the request only contains the domain. uWhoisd will automatically strip the subdomain part if it is present in the request.
Example: https://www.theguardian.co.uk
TLD: co.uk (uWhoisd uses the Mozilla public suffix list as a reference)
Domain: theguardian.co.uk
Subdomain: www
The whois request will be for theguardian.co.uk
Information
name: intelmq.bots.experts.uwhoisd.expert
description: uWhoisd is a universal Whois server
Configuration Parameters
server: IP or hostname to connect to (default: localhost)
port: Port to connect to (default: 4243)
Wait¶
Information
name: intelmq.bots.experts.wait.expert
lookup: none
public: yes
cache (redis db): none
description: Waits for a some time or until a queue size is lower than a given number.
Configuration Parameters
queue_db: Database number of the database, default 2. Converted to integer.
queue_host: Host of the database, default localhost.
queue_name: Name of the queue to be watched, default null. This is not the name of a bot but the queue’s name.
queue_password: Password for the database, default None.
queue_polling_interval: Interval to poll the list length in seconds. Converted to float.
queue_port: Port of the database, default 6379. Converted to integer.
queue_size: Maximum size of the queue, default 0. Compared by <=. Converted to integer.
sleep_time: Time to sleep before sending the event.
Only one of the two modes is possible. If a queue name is given, the queue mode is active. If the sleep_time is a number, sleep mode is active. Otherwise the dummy mode is active, the events are just passed without an additional delay.
Note that SIGHUPs and reloads interrupt the sleeping.
Output Bots¶
AMQP Topic¶
Sends data to an AMQP Server See https://www.rabbitmq.com/tutorials/amqp-concepts.html for more details on amqp topic exchange.
Requires the pika python library.
Information
name: intelmq.bots.outputs.amqptopic.output
lookup: to the amqp server
public: yes
cache: no
description: Sends the event to a specified topic of an AMQP server
Configuration parameters
connection_attempts : The number of connection attempts to defined server, defaults to 3
connection_heartbeat : Heartbeat to server, in seconds, defaults to 3600
connection_host : Name/IP for the AMQP server, defaults to 127.0.0.1
connection_port : Port for the AMQP server, defaults to 5672
connection_vhost : Virtual host to connect, on an http(s) connection would be http:/IP/<your virtual host>
content_type : Content type to deliver to AMQP server, currently only supports “application/json”
delivery_mode : 1 - Non-persistent, 2 - Persistent. On persistent mode, messages are delivered to ‘durable’ queues and will be saved to disk.
exchange_durable : If set to True, the exchange will survive broker restart, otherwise will be a transient exchange.
exchange_name : The name of the exchange to use
exchange_type : Type of the exchange, e.g. topic, fanout etc.
keep_raw_field : If set to True, the message ‘raw’ field will be sent
password : Password for authentication on your AMQP server
require_confirmation : If set to True, an exception will be raised if a confirmation error is received
routing_key : The routing key for your amqptopic
single_key : Only send the field instead of the full event (expecting a field name as string)
username : Username for authentication on your AMQP server
use_ssl : Use ssl for the connection, make sure to also set the correct port, usually 5671 (true/false)
message_hierarchical_output: Convert the message to hierarchical JSON, default: false
message_with_type : Include the type in the sent message, default: false
message_jsondict_as_string: Convert fields of type JSONDict (extra) as string, default: false
If no authentication should be used, leave username or password empty or null.
Examples of usage
Useful to send events to a RabbitMQ exchange topic to be further processed in other platforms.
Confirmation
If routing key or exchange name are invalid or non existent, the message is accepted by the server but we receive no confirmation. If parameter require_confirmation is True and no confirmation is received, an error is raised.
Common errors
Unroutable messages / Undefined destination queue
The destination exchange and queue need to exist beforehand, with your preferred settings (e.g. durable, lazy queue. If the error message says that the message is “unroutable”, the queue doesn’t exist.
Blackhole¶
This output bot discards all incoming messages.
Information
name: intelmq.bots.outputs.blackhole.output
lookup: no
public: yes
cache: no
description: discards messages
Bro file¶
Information
name: intelmq.bots.outputs.bro_file.output
lookup: no
public: yes
cache: no
description: BRO (zeek) file output
Description
File example:
`
#fields indicator indicator_type meta.desc meta.cif_confidence meta.source
xxx.xxx.xxx.xxx Intel::ADDR phishing 100 MISP XXX
www.testdomain.com Intel::DOMAIN apt 85 CERT
`
Elasticsearch Output Bot¶
Information
name: intelmq.bots.outputs.elasticsearch.output
lookup: yes
public: yes
cache: no
description: Output Bot that sends events to Elasticsearch
Only ElasticSearch version 7 supported.
It is also possible to feed data into ElasticSearch using ELK-Stack via Redis and Logstash, see ELK Stack for more information. This methods supports various different versions of ElasticSearch.
Configuration parameters
elastic_host: Name/IP for the Elasticsearch server, defaults to 127.0.0.1
elastic_port: Port for the Elasticsearch server, defaults to 9200
elastic_index: Index for the Elasticsearch output, defaults to intelmq
rotate_index: If set, will index events using the date information associated with the event.
Options: ‘never’, ‘daily’, ‘weekly’, ‘monthly’, ‘yearly’. Using ‘intelmq’ as the elastic_index, the following are examples of the generated index names:
'never' --> intelmq 'daily' --> intelmq-2018-02-02 'weekly' --> intelmq-2018-42 'monthly' --> intelmq-2018-02 'yearly' --> intelmq-2018
http_username: HTTP basic authentication username
http_password: HTTP basic authentication password
use_ssl: Whether to use SSL/TLS when connecting to Elasticsearch. Default: False
http_verify_cert: Whether to require verification of the server’s certificate. Default: False
ssl_ca_certificate: An optional path to a certificate bundle to use for verifying the server
ssl_show_warnings: Whether to show warnings if the server’s certificate cannot be verified. Default: True
replacement_char: If set, dots (‘.’) in field names will be replaced with this character prior to indexing. This is for backward compatibility with ES 2.X. Default: null. Recommended for ES2.X: ‘_’
flatten_fields: In ES, some query and aggregations work better if the fields are flat and not JSON. Here you can provide a list of fields to convert.
Can be a list of strings (fieldnames) or a string with field names separated by a comma (,). eg extra,field2 or [‘extra’, ‘field2’] Default: [‘extra’]
See contrib/elasticsearch/elasticmapper for a utility for creating Elasticsearch mappings and templates.
If using rotate_index, the resulting index name will be of the form [elastic_index]-[event date]. To query all intelmq indices at once, use an alias (https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html), or a multi-index query.
The data in ES can be retrieved with the HTTP-Interface:
> curl -XGET 'http://localhost:9200/intelmq/events/_search?pretty=True'
File¶
Information
name: intelmq.bots.outputs.file.output
lookup: no
public: yes
cache (redis db): none
description: output messages (reports or events) to file
Multihreading is disabled for this bot, as this would lead to corrupted files.
Configuration Parameters
encoding_errors_mode: By default ‘strict’, see for more details and options: https://docs.python.org/3/library/functions.html#open For example with ‘backslashreplace’ all characters which cannot be properly encoded will be written escaped with backslashes.
file: file path of output file. Missing directories will be created if possible with the mode 755.
format_filename: Boolean if the filename should be formatted (default: false).
hierarchical_output: If true, the resulting dictionary will be hierarchical (field names split by dot).
single_key: if none, the whole event is saved (default); otherwise the bot saves only contents of the specified key. In case of raw the data is base64 decoded.
Filename formatting
The filename can be formatted using pythons string formatting functions if format_filename is set. See https://docs.python.org/3/library/string.html#formatstrings
- For example:
The filename …/{event[source.abuse_contact]}.txt will be (for example) …/abuse@example.com.txt.
…/{event[time.source]:%Y-%m-%d} results in the date of the event used as filename.
If the field used in the format string is not defined, None will be used as fallback.
Files¶
Information
name: intelmq.bots.outputs.files.output
lookup: no
public: yes
cache (redis db): none
description: saving of messages as separate files
Configuration Parameters
dir: output directory (default /opt/intelmq/var/lib/bots/files-output/incoming)
tmp: temporary directory (must reside on the same filesystem as dir) (default: /opt/intelmq/var/lib/bots/files-output/tmp)
suffix: extension of created files (default .json)
hierarchical_output: if true, use nested dictionaries; if false, use flat structure with dot separated keys (default)
single_key: if none, the whole event is saved (default); otherwise the bot saves only contents of the specified key
McAfee Enterprise Security Manager¶
Information
name: intelmq.bots.outputs.mcafee.output_esm_ip
lookup: yes
public: no
cache (redis db): none
description: Writes information out to McAfee ESM watchlist
Configuration Parameters
Feed parameters (see above)
esm_ip: IP address of ESM instance
esm_user: username of user entitled to write to watchlist
esm_pw: password of user
esm_watchlist: name of the watchlist to write to
field: name of the IntelMQ field to be written to ESM
MISP Feed¶
Information
name: intelmq.bots.outputs.misp.output_feed
lookup: no
public: no
cache (redis db): none
description: Create a directory layout in the MISP Feed format
The PyMISP library >= 2.4.119.1 is required, see REQUIREMENTS.txt.
Configuration Parameters
Feed parameters (see above)
misp_org_name: Org name which creates the event, string
misp_org_uuid: Org UUID which creates the event, string
output_dir: Output directory path, e.g. /opt/intelmq/var/lib/bots/mispfeed-output. Will be created if it does not exist and possible.
interval_event: The output bot creates one event per each interval, all data in this time frame is part of this event. Default “1 hour”, string.
Usage in MISP
Configure the destination directory of this feed as feed in MISP, either as local location, or served via a web server. See the MISP documentation on Feeds for more information
MISP API¶
Information
name: intelmq.bots.outputs.misp.output_api
lookup: no
public: no
cache (redis db): none
description: Connect to a MISP instance and add event as MISPObject if not there already.
The PyMISP library >= 2.4.120 is required, see REQUIREMENTS.txt.
Configuration Parameters
Feed parameters (see above)
add_feed_provider_as_tag: boolean (use true when in doubt)
add_feed_name_as_tag: boolean (use true when in doubt)
misp_additional_correlation_fields: list of fields for which the correlation flags will be enabled (in addition to those which are in significant_fields)
misp_additional_tags: list of tags to set not be searched for when looking for duplicates
misp_key: string, API key for accessing MISP
misp_publish: boolean, if a new MISP event should be set to “publish”.
Expert setting as MISP may really make it “public”! (Use false when in doubt.)
misp_tag_for_bot: string, used to mark MISP events
misp_to_ids_fields: list of fields for which the to_ids flags will be set
misp_url: string, URL of the MISP server
significant_fields: list of intelmq field names
The significant_fields values will be searched for in all MISP attribute values and if all values are found in the same MISP event, no new MISP event will be created. Instead if the existing MISP events have the same feed.provider and match closely, their timestamp will be updated.
If a new MISP event is inserted the significant_fields and the misp_additional_correlation_fields will be the attributes where correlation is enabled.
Make sure to build the IntelMQ Botnet in a way the rate of incoming events is what MISP can handle, as IntelMQ can process many more events faster than MISP (which is by design as MISP is for manual handling). Also remove the fields of the IntelMQ events with an expert bot that you do not want to be inserted into MISP.
(More details can be found in the docstring of output_api.py.
MongoDB¶
Saves events in a MongoDB either as hierarchical structure or flat with full key names. time.observation and time.source are saved as datetime objects, not as ISO formatted string.
Information
name: intelmq.bots.outputs.mongodb.output
lookup: no
public: yes
cache (redis db): none
description: MongoDB is the bot responsible to send events to a MongoDB database
Configuration Parameters
collection: MongoDB collection
database: MongoDB database
db_user : Database user that should be used if you enabled authentication
db_pass : Password associated to db_user
host: MongoDB host (FQDN or IP)
port: MongoDB port, default: 27017
hierarchical_output: Boolean (default true) as MongoDB does not allow saving keys with dots, we split the dictionary in sub-dictionaries.
replacement_char: String (default ‘_’) used as replacement character for the dots in key names if hierarchical output is not used.
Installation Requirements
pip3 install pymongo>=2.7.1
The bot has been tested with pymongo versions 2.7.1, 3.4 and 3.10.1 (server versions 2.6.10 and 3.6.8).
Redis¶
Information
name: intelmq.bots.outputs.redis.output
lookup: to the Redis server
public: yes
cache (redis db): none
description: Output Bot that sends events to a remote Redis server/queue.
Configuration Parameters
redis_db: remote server database, e.g.: 2
redis_password: remote server password
redis_queue: remote server list (queue), e.g.: “remote-server-queue”
redis_server_ip: remote server IP address, e.g.: 127.0.0.1
redis_server_port: remote server Port, e.g.: 6379
redis_timeout: Connection timeout, in milliseconds, e.g.: 50000
hierarchical_output: whether output should be sent in hierarchical JSON format (default: false)
with_type: Send the __type field (default: true)
Examples of usage
Can be used to send events to be processed in another system. E.g.: send events to Logstash.
In a multi tenant installation can be used to send events to external/remote IntelMQ instance. Any expert bot queue can receive the events.
In a complex configuration can be used to create logical sets in IntelMQ-Manager.
Request Tracker¶
Information
name: intelmq.bots.outputs.rt.output
lookup: to the Request Tracker instance
public: yes
cache (redis db): none
description: Output Bot that creates Request Tracker tickets from events.
Description
The bot creates tickets in Request Tracker and uses event fields for the ticket body text. The bot follows the workflow of the RTIR:
create ticket in Incidents queue (or any other queue)
all event fields are included in the ticket body,
event attributes are assigned to tickets’ CFs according to the attribute mapping,
ticket taxonomy can be assigned according to the CF mapping. If you use taxonomy different from ENISA RSIT, consider using some extra attribute field and do value mapping with modify or sieve bot,
create linked ticket in Investigations queue, if these conditions are met
if first ticket destination was Incidents queue,
if there is source.abuse_contact is specified,
if description text is specified in the field appointed by configuration,
RT/RTIR supposed to do relevant notifications by script working on condition “On Create”,
configuration option investigation_fields specifies which event fields has to be included in the investigation,
Resolve Incident ticket, according to configuration (Investigation ticket status should depend on RT script configuration),
Take extra caution not to flood your ticketing system with enormous amount of tickets. Add extra filtering for that to pass only critical events to the RT, and/or deduplicating events.
Configuration Parameters
rt_uri, rt_user, rt_password, verify_cert: RT API endpoint connection details, string.
queue: ticket destination queue. If set to ‘Incidents’, ‘Investigations’ ticket will be created if create_investigation is set to true, string.
CF_mapping: mapping attributes to ticket CFs, dictionary. E.g {“event_description.text”:”Description”,”source.ip”:”IP”,”extra.classification.type”:”Incident Type”,”classification.taxonomy”:”Classification”}
final_status: the final status for the created ticket, string. E.g. resolved if you want to resolve the created ticket. The linked Investigation ticket will be resolved automatically by RTIR scripts.
create_investigation: if an Investigation ticket should be created (in case of RTIR workflow). true or false, boolean.
investigation_fields: attributes to include into investigation ticket, comma-separated string. E.g. time.source,source.ip,source.port,source.fqdn,source.url,classification.taxonomy,classification.type,classification.identifier,event_description.url,event_description.text,malware.name,protocol.application,protocol.transport.
description_attr: which event attribute contains text message being sent to the recipient, string. If it is not specified or not found in the event, the Investigation ticket is not going to be created. Example: extra.message.text.
REST API¶
Information
name: intelmq.bots.outputs.restapi.output
lookup: no
public: yes
cache (redis db): none
description: REST API is the bot responsible to send events to a REST API listener through POST
Configuration Parameters
auth_token: the user name / HTTP header key
auth_token_name: the password / HTTP header value
auth_type: one of: “http_basic_auth”, “http_header”
hierarchical_output: boolean
host: destination URL
use_json: boolean
RPZ¶
The DNS RPZ functionality is “DNS firewall”. Bot generate a blocklist.
Information
name: intelmq.bots.outputs.rpz_file.output
lookup: no
public: yes
cache (redis db): none
description: Generate RPZ file
Configuration Parameters
cname: example rpz.yourdomain.eu
organization_name: Your organisation name
rpz_domain: Information website about RPZ
hostmaster_rpz_domain: Technical website
rpz_email: Contact email
ttl: Time to live
ncachttl: DNS negative cache
serial: Time stamp or another numbering
refresh: Refresh time
retry: Retry time
expire: Expiration time
test_domain: For test domain, it’s added in first rpz file (after header)
File example:
`
$TTL 3600
@ SOA rpz.yourdomain.eu. hostmaster.rpz.yourdomain.eu. 2105260601 60 60 432000 60
NS localhost.
;
; yourdomain.eu. CERT.XX Response Policy Zones (RPZ)
; Last updated: 2021-05-26 06:01:41 (UTC)
;
; Terms Of Use: https://rpz.yourdomain.eu
; For questions please contact rpz [at] yourdomain.eu
;
*.maliciousdomain.com CNAME rpz.yourdomain.eu.
*.secondmaliciousdomain.com CNAME rpz.yourdomain.eu.
`
Description
The prime motivation for creating this feature was to protect users from badness on the Internet related to known-malicious global identifiers such as host names, domain names, IP addresses, or nameservers. More information: https://dnsrpz.info
SMTP Output Bot¶
Sends a MIME Multipart message containing the text and the event as CSV for every single event.
Information
name: intelmq.bots.outputs.smtp.output
lookup: no
public: yes
cache (redis db): none
description: Sends events via SMTP
Configuration Parameters
fieldnames: a list of field names to be included in the email, comma separated string or list of strings. If empty, no attachment is sent - this can be useful if the actual data is already in the body (parameter
text
) or thesubject
.mail_from: string. Supports formatting, see below
mail_to: string of email addresses, comma separated. Supports formatting, see below
smtp_host: string
smtp_password: string or null, Password for authentication on your SMTP server
smtp_port: port
smtp_username: string or null, Username for authentication on your SMTP server
ssl: boolean
starttls: boolean
subject: string. Supports formatting, see below
text: string or null. Supports formatting, see below
For several strings you can use values from the string using the standard Python string format syntax. Access the event’s values with {ev[source.ip]} and similar. Any not existing fields will result in None. For example, to set the recipient(s) to the value given in the event’s source.abuse_contact field, use this as mail_to parameter: {ev[source.abuse_contact]}
Authentication is optional. If both username and password are given, these mechanism are tried: CRAM-MD5, PLAIN, and LOGIN.
Client certificates are not supported. If http_verify_cert is true, TLS certificates are checked.
SQL¶
Information
name: intelmq.bots.outputs.sql.output
lookup: no
public: yes
cache (redis db): none
description: SQL is the bot responsible to send events to a PostgreSQL, SQLite, or MSSQL Database, e.g. the IntelMQ EventDB
notes: When activating autocommit, transactions are not used: http://initd.org/psycopg/docs/connection.html#connection.autocommit
Configuration Parameters
The parameters marked with ‘PostgreSQL’ will be sent to libpq via psycopg2. Check the libpq parameter documentation for the versions you are using.
autocommit: psycopg’s autocommit mode, optional, default True
connect_timeout: Database connect_timeout, optional, default 5 seconds
engine: ‘postgresql’, ‘sqlite’, or ‘mssql’
database: Database or SQLite file
host: Database host
jsondict_as_string: save JSONDict fields as JSON string, boolean. Default: true (like in versions before 1.1)
port: Database port
user: Database user
password: Database password
sslmode: Database sslmode, can be ‘disable’, ‘allow’, ‘prefer’ (default), ‘require’, ‘verify-ca’ or ‘verify-full’. See postgresql docs: https://www.postgresql.org/docs/current/static/libpq-connect.html#libpq-connect-sslmode
table: name of the database table into which events are to be inserted
fields: list of fields to read from the event. If None, read all fields
reconnect_delay: number of seconds to wait before reconnecting in case of an error
PostgreSQL¶
You have two basic choices to run PostgreSQL:
on the same machine as intelmq, then you could use Unix sockets if available on your platform
on a different machine. In which case you would need to use a TCP connection and make sure you give the right connection parameters to each psql or client call.
Make sure to consult your PostgreSQL documentation about how to allow network connections and authentication in case 2.
PostgreSQL Version
Any supported version of PostgreSQL should work (v>=9.2 as of Oct 2016) [1].
If you use PostgreSQL server v >= 9.4, it gives you the possibility to use the time-zone formatting string “OF” for date-times and the GiST index for the CIDR type. This may be useful depending on how you plan to use the events that this bot writes into the database.
How to install
Use intelmq_psql_initdb to create initial SQL statements from harmonization.conf. The script will create the required table layout and save it as /tmp/initdb.sql
You need a PostgreSQL database-user to own the result database. The recommendation is to use the name intelmq. There may already be such a user for the PostgreSQL database-cluster to be used by other bots. (For example from setting up the expert/certbund_contact bot.)
Therefore if still necessary: create the database-user as postgresql superuser, which usually is done via the system user postgres:
createuser --no-superuser --no-createrole --no-createdb --encrypted --pwprompt intelmq
Create the new database:
createdb --encoding='utf-8' --owner=intelmq intelmq-events
(The encoding parameter should ensure the right encoding on platform where this is not the default.)
Now initialize it as database-user intelmq (in this example a network connection to localhost is used, so you would get to test if the user intelmq can authenticate):
psql -h localhost intelmq-events intelmq </tmp/initdb.sql
PostgreSQL and null characters
While null characters (0, not SQL “NULL”) in TEXT and JSON/JSONB fields are valid, data containing null characters can cause troubles in some combinations of clients, servers and each settings. To prevent unhandled errors and data which can’t be inserted into the database, all null characters are escaped (\u0000) before insertion.
SQLite¶
Similarly to PostgreSQL, you can use intelmq_psql_initdb to create initial SQL statements from harmonization.conf. The script will create the required table layout and save it as /tmp/initdb.sql.
Create the new database (you can ignore all errors since SQLite doesn’t know all SQL features generated for PostgreSQL):
sqlite3 your-db.db
sqlite> .read /tmp/initdb.sql
Then, set the database parameter to the your-db.db file path.
MSSQL
For MSSQL support, the library pymssql>=2.2 is required.
STOMP¶
Information
name: intelmq.bots.outputs.stomp.output
lookup: yes
public: yes
cache (redis db): none
description: This collector will push data to any STOMP stream. STOMP stands for Streaming Text Oriented Messaging Protocol. See: https://en.wikipedia.org/wiki/Streaming_Text_Oriented_Messaging_Protocol
Requirements :
Install the stomp.py library, e.g. apt install python3-stomp.py or pip install stomp.py.
You need a CA certificate, client certificate and key file from the organization / server you are connecting to. Also you will need a so called “exchange point”.
Configuration Parameters
exchange: The exchange to push at
heartbeat: default: 60000
message_hierarchical_output: Boolean, default: false
message_jsondict_as_string: Boolean, default: false
message_with_type: Boolean, default: false
port: Integer, default: 61614
server: Host or IP address of the STOMP server
single_key: Boolean or string (field name), default: false
ssl_ca_certificate: path to CA file
ssl_client_certificate: path to client cert file
ssl_client_certificate_key: path to client cert key file
TCP¶
Information
name: intelmq.bots.outputs.tcp.output
lookup: no
public: yes
cache (redis db): none
description: TCP is the bot responsible to send events to a TCP port (Splunk, another IntelMQ, etc..).
Multihreading is disabled for this bot.
Configuration Parameters
counterpart_is_intelmq: Boolean. If you are sending to an IntelMQ TCP collector, set this to True, otherwise e.g. with filebeat, set it to false.
ip: IP of destination server
hierarchical_output: true for a nested JSON, false for a flat JSON (when sending to a TCP collector).
port: port of destination server
separator: separator of messages, e.g. “n”, optional. When sending to a TCP collector, parameter shouldn’t be present. In that case, the output waits every message is acknowledged by “Ok” message the TCP collector bot implements.
Sending to an IntelMQ TCP collector
If you intend to link two IntelMQ instance via TCP, set the parameter counterpart_is_intelmq to true. The bot then awaits an “Ok” message to be received after each message is sent. The TCP collector just sends “Ok” after every message it gets.
Templated SMTP¶
Sends a MIME Multipart message built from an event and static text using Jinja2 templates.
Information
name: intelmq.bots.outputs.templated_smtp.output
lookup: no
public: yes
cache (redis db): none
description: Sends events via SMTP
Requirements
Install the required jinja2 library:
pip3 install -r intelmq/bots/collectors/templated_smtp/REQUIREMENTS.txt
Configuration Parameters
Parameters:
attachments: list of objects with structure:
- content-type: string, templated, content-type to use. text: string, templated, attachment text. name: string, templated, filename of attachment.
body: string, optional, templated, body text. The default body template prints every field in the event except ‘raw’, in undefined order, one field per line, as “field: value”.
mail_from: string, templated, sender address.
mail_to: string, templated, recipient addresses, comma-separated.
smtp_host: string, optional, default “localhost”, hostname of SMTP server.
smtp_password: string, default null, password (if any) for authenticated SMTP.
smtp_port: integer, default 25, TCP port to connect to.
smtp_username: string, default null, username (if any) for authenticated SMTP.
tls: boolean, default false, whether to use use SMTPS. If true, also set smtp_port to the SMTPS port.
starttls: boolean, default true, whether to use opportunistic STARTTLS over SMTP.
subject: string, optional, default “IntelMQ event”, templated, e-mail subject line.
verify_cert: boolean, default true, whether to verify the server certificate in STARTTLS or SMTPS.
Authentication is attempted only if both username and password are specified.
Templates are in Jinja2 format with the event provided in the variable “event”. E.g.:
mail_to: "{{ event['source.abuse_contact'] }}"
See the Jinja2 documentation at https://jinja.palletsprojects.com/ .
As an extension to the Jinja2 environment, the function “from_json” is available for parsing JSON strings into Python structures. This is useful if you want to handle complicated structures in the “output” field of an event. In that case, you would start your template with a line like:
{%- set output = from_json(event['output']) %}
and can then use “output” as a regular Python object in the rest of the template.
Attachments are template strings, especially useful for sending structured data. E.g. to send a JSON document including “malware.name” and all other fields starting with “source.”:
attachments:
- content-type: application/json
text: |
{
"malware": "{{ event['malware.name'] }}",
{%- set comma = joiner(", ") %}
{%- for key in event %}
{%- if key.startswith('source.') %}
{{ comma() }}"{{ key }}": "{{ event[key] }}"
{%- endif %}
{%- endfor %}
}
name: report.json
You are responsible for making sure that the text produced by the template is valid according to the content-type.
If you are migrating from the SMTP output bot that produced CSV format attachments, use the following configuration to produce a matching format:
attachments:
- content-type: text/csv
text: |
{%- set fields = ["classification.taxonomy", "classification.type", "classification.identifier", "source.ip", "source.asn", "source.port"] %}
{%- set sep = joiner(";") %}
{%- for field in fields %}{{ sep() }}{{ field }}{%- endfor %}
{% set sep = joiner(";") %}
{%- for field in fields %}{{ sep() }}{{ event[field] }}{%- endfor %}
name: event.csv
Touch¶
Information
name: intelmq.bots.outputs.touch.output
lookup: no
public: yes
cache (redis db): none
description: Touches a file for every event received.
Configuration Parameters
path: Path to the file to touch.
UDP¶
Information
name: intelmq.bots.outputs.udp.output
lookup: no
public: yes
cache (redis db): none
description: Output Bot that sends events to a remote UDP server.
Multihreading is disabled for this bot.
Configuration Parameters
field_delimiter: If the format is ‘delimited’ this will be added between fields. String, default: “|”
format: Can be ‘json’ or ‘delimited’. The JSON format outputs the event ‘as-is’. Delimited will deconstruct the event and print each field:value separated by the field delimit. See examples below.
header: Header text to be sent in the UDP datagram, string.
keep_raw_field: boolean, default: false
udp_host: Destination’s server’s Host name or IP address
udp_port: Destination port
Examples of usage
Consider the following event:
{"raw": "MjAxNi8wNC8yNV8xMTozOSxzY2hpenppbm8ub21hcmF0aG9uLmNvbS9na0NDSnVUSE0vRFBlQ1pFay9XdFZOSERLbC1tWFllRk5Iai8sODUuMjUuMTYwLjExNCxzdGF0aWMtaXAtODUtMjUtMTYwLTExNC5pbmFkZHIuaXAtcG9vbC5jb20uLEFuZ2xlciBFSywtLDg5NzI=", "source": {"asn": 8972, "ip": "85.25.160.114", "url": "http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/", "reverse_dns": "static-ip-85-25-160-114.inaddr.ip-pool.com"}, "classification": {"type": "malware-distribution"}, "event_description": {"text": "Angler EK"}, "feed": {"url": "http://www.malwaredomainlist.com/updatescsv.php", "name": "Malware Domain List", "accuracy": 100.0}, "time": {"observation": "2016-04-29T10:59:34+00:00", "source": "2016-04-25T11:39:00+00:00"}}
With the following Parameters:
field_delimiter : |
format : json
Header : header example
keep_raw_field : true
ip : 127.0.0.1
port : 514
Resulting line in syslog:
Apr 29 11:01:29 header example {"raw": "MjAxNi8wNC8yNV8xMTozOSxzY2hpenppbm8ub21hcmF0aG9uLmNvbS9na0NDSnVUSE0vRFBlQ1pFay9XdFZOSERLbC1tWFllRk5Iai8sODUuMjUuMTYwLjExNCxzdGF0aWMtaXAtODUtMjUtMTYwLTExNC5pbmFkZHIuaXAtcG9vbC5jb20uLEFuZ2xlciBFSywtLDg5NzI=", "source": {"asn": 8972, "ip": "85.25.160.114", "url": "http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/", "reverse_dns": "static-ip-85-25-160-114.inaddr.ip-pool.com"}, "classification": {"type": "malware-distribution"}, "event_description": {"text": "Angler EK"}, "feed": {"url": "http://www.malwaredomainlist.com/updatescsv.php", "name": "Malware Domain List", "accuracy": 100.0}, "time": {"observation": "2016-04-29T10:59:34+00:00", "source": "2016-04-25T11:39:00+00:00"}}
With the following Parameters:
field_delimiter : |
format : delimited
Header : IntelMQ-event
keep_raw_field : false
ip : 127.0.0.1
port : 514
Resulting line in syslog:
Apr 29 11:17:47 localhost IntelMQ-event|source.ip: 85.25.160.114|time.source:2016-04-25T11:39:00+00:00|feed.url:http://www.malwaredomainlist.com/updatescsv.php|time.observation:2016-04-29T11:17:44+00:00|source.reverse_dns:static-ip-85-25-160-114.inaddr.ip-pool.com|feed.name:Malware Domain List|event_description.text:Angler EK|source.url:http://schizzino.omarathon.com/gkCCJuTHM/DPeCZEk/WtVNHDKl-mXYeFNHj/|source.asn:8972|classification.type:malware-distribution|feed.accuracy:100.0
intelmqctl documentation¶
Contents
Introduction¶
intelmqctl is the main tool to handle a intelmq installation. It handles the bots themselves and has some tools to handle the installation.
Output type¶
intelmqctl can be used as command line tool, as library and as tool by other programs. If called directly, it will print all output to the console (stderr). If used as python library, the python types themselves are returned. The third option is to use machine-readable JSON as output (used by other managing tools).
Manage individual bots¶
As all init systems, intelmqctl has the methods start, stop, restart, reload and status.
start¶
This will start the bot with the ID file-output. A file with it’s PID will be created in /opt/intelmq/var/run/[bot-id].pid.
> intelmqctl start file-output
Starting file-output...
file-output is running.
If the bot is already running, it won’t be started again:
> intelmqctl start file-output
file-output is running.
stop¶
If the PID file does exist, a SIGINT will be sent to the process. After 0.25s we check if the process is running. If not, the PID file will be removed.
> intelmqctl stop file-output
Stopping file-output...
file-output is stopped.
If there’s no running bot, there’s nothing to do.
> intelmqctl stop file-output
file-output was NOT RUNNING.
If the bot did not stop in 0.25s, intelmqctl will say it’s still running:
> intelmqctl stop file-output
file-output is still running
status¶
Checks for the PID file and if the process with the given PID is alive. If the PID file exists, but the process does not exist, it will be removed.
> intelmqctl status file-output
file-output is stopped.
> intelmqctl start file-output
Starting file-output...
file-output is running.
> intelmqctl status file-output
file-output is running.
restart¶
The same as stop and start consecutively.
> intelmqctl restart file-output
Stopping file-output...
file-output is stopped.
Starting file-output...
file-output is running.
reload¶
Sends a SIGHUP to the bot, which will then reload the configuration.
> intelmqctl reload file-output
Reloading file-output ...
file-output is running.
If the bot is not running, we can’t reload it:
> intelmqctl reload file-output
file-output was NOT RUNNING.
run¶
Run a bot directly for debugging purpose.
If launched with no arguments, the bot will call its init method and start processing messages as usual – but you see everything happens.
> intelmqctl run file-output
file-output: RestAPIOutputBot initialized with id file-output and version 3.5.2 as process 12345.
file-output: Bot is starting.
file-output: Loading source pipeline and queue 'file-output-queue'.
file-output: Connected to source queue.
file-output: No destination queues to load.
file-output: Bot initialization completed.
file-output: Waiting for incoming message.
Should you get lost any time, just use the –help after any argument for further explanation.
> intelmqctl run file-output --help
Note that if another instance of the bot is running, only warning will be displayed.
> intelmqctl run file-output
Main instance of the bot is running in the background. You may want to launch: intelmqctl stop file-output
You can set the log level with the -l flag, e.g. -l DEBUG. For the ‘console’ subcommand, ‘DEBUG’ is the default.
console¶
If launched with console argument, you get a `pdb`
live console; or `ipdb`
or `pudb`
consoles if they were previously installed (I.E. `pip3 install ipdb --user`
).
> intelmqctl run file-output console
*** Using console ipdb. Please use 'self' to access to the bot instance properties. ***
ipdb> self. ...
You may specify the desired console in the next argument.
> intelmqctl run file-output console pudb
message¶
Operate directly with the input / output pipelines.
If get is the parameter, you see the message that waits in the input (source or internal) queue. If the argument is pop, the message gets popped as well.
> intelmqctl run file-output message get
file-output: Waiting for a message to get...
{
"classification.type": "c&c",
"feed.url": "https://example.com",
"raw": "1233",
"source.ip": "1.2.3.4",
"time.observation": "2017-05-17T22:00:33+00:00",
"time.source": "2017-05-17T22:00:32+00:00"
}
To send directly to the bot’s output queue, just as it was sent by `self.send_message()`
in bot’s `process()`
method, use the send argument.
In our case of `file-output`
, it has no destination queue so that nothing happens.
> intelmqctl run file-output message send '{"time.observation": "2017-05-17T22:00:33+00:00", "time.source": "2017-05-17T22:00:32+00:00"}'
file-output: Bot has no destination queues.
Note, if you would like to know possible parameters of the message, put a wrong one – you will be prompted if you want to list all the current bot harmonization.
process¶
With no other arguments, bot's `process()`
method will be run one time.
> intelmqctl run file-output process
file-output: Bot is starting.
file-output: Bot initialization completed.
file-output: Processing...
file-output: Waiting for incoming message.
file-output: Received message {'raw': '1234'}.
If run with –dryrun|-d flag, the message gets never really popped out from the source or internal pipeline, nor sent to the output pipeline. Plus, you receive a note about the exact moment the message would get sent, or acknowledged. If the message would be sent to a non-default path, the name of this path is printed on the console.
> intelmqctl run file-output process -d
file-output: * Dryrun only, no message will be really sent through.
...
file-output: DRYRUN: Message would be acknowledged now!
You may trick the bot to process a JSON instead of the Message in its pipeline with –msg|-m flag.
> intelmqctl run file-output process -m '{"source.ip":"1.2.3.4"}'
file-output: * Message from cli will be used when processing.
...
If you wish to display the processed message as well, you the –show-sent|-s flag. Then, if sent through (either with –dryrun or without), the message gets displayed as well.
disable¶
Sets the enabled flag in the runtime configuration of the bot to false. By default, all bots are enabled.
Example output:
> intelmqctl status file-output
file-output is stopped.
> intelmqctl disable file-output
> intelmqctl status file-output
file-output is disabled.
enable¶
Sets the enabled flag in the runtime configuration of the bot to true.
Example output:
> intelmqctl status file-output
file-output is disabled.
> intelmqctl enable file-output
> intelmqctl status file-output
file-output is stopped.
Manage the botnet¶
In IntelMQ, the botnet is the set of all currently configured and enabled bots.
All configured bots have their configuration in runtime.yaml
.
By default, all bots are enabled. To disable a bot set enabled to false.
Also see Bots inventory and Runtime Configuration.
If not bot id is given, the command applies to all bots / the botnet. All commands except the start action are applied to all bots. But only enabled bots are started.
In the examples below, a very minimal botnet is used.
start¶
The start action applies to all bots which are enabled.
> intelmqctl start
Starting abusech-domain-parser...
abusech-domain-parser is running.
Starting abusech-feodo-domains-collector...
abusech-feodo-domains-collector is running.
Starting deduplicator-expert...
deduplicator-expert is running.
file-output is disabled.
Botnet is running.
As we can file-output is disabled and thus has not been started. You can always explicitly start disabled bots.
stop¶
The stop action applies to all bots. Assume that all bots have been running:
> intelmqctl stop
Stopping Botnet...
Stopping abusech-domain-parser...
abusech-domain-parser is stopped.
Stopping abusech-feodo-domains-collector...
abusech-feodo-domains-collector is stopped.
Stopping deduplicator-expert...
deduplicator-expert is stopped.
Stopping file-output...
file-output is stopped.
Botnet is stopped.
status¶
With this command we can see the status of all configured bots. Here, the botnet was started beforehand:
> intelmqctl status
abusech-domain-parser is running.
abusech-feodo-domains-collector is running.
deduplicator-expert is running.
file-output is disabled.
And if the disabled bot has also been started:
> intelmqctl status
abusech-domain-parser is running.
abusech-feodo-domains-collector is running.
deduplicator-expert is running.
file-output is running.
If the botnet is stopped, the output looks like this:
> intelmqctl status
abusech-domain-parser is stopped.
abusech-feodo-domains-collector is stopped.
deduplicator-expert is stopped.
file-output is disabled.
restart¶
The same as start and stop consecutively.
reload¶
The same as reload of every bot.
enable / disable¶
The sub commands enable and disable set the corresponding flags in runtime.yaml
.
> intelmqctl status
file-output is stopped.
malware-domain-list-collector is stopped.
malware-domain-list-parser is stopped.
> intelmqctl disable file-output
> intelmqctl status
file-output is disabled.
malware-domain-list-collector is stopped.
malware-domain-list-parser is stopped.
> intelmqctl enable file-output
> intelmqctl status
file-output is stopped.
malware-domain-list-collector is stopped.
malware-domain-list-parser is stopped.
List bots¶
intelmqctl list bots does list all configured bots and their description.
List queues¶
intelmqctl list queues shows all queues which are currently in use according to the configuration and how much events are in it:
> intelmqctl list queues
abusech-domain-parser-queue - 0
abusech-domain-parser-queue-internal - 0
deduplicator-expert-queue - 0
deduplicator-expert-queue-internal - 0
file-output-queue - 234
file-output-queue-internal - 0
Use the -q or –quiet flag to only show non-empty queues:
> intelmqctl list queues -q
file-output-queue - 234
The –sum or –count flag will show the sum of events on all queues:
> intelmqctl list queues --sum
42
Log¶
intelmqctl can show the last log lines for a bot, filtered by the log level.
See the help page for more information.
Check¶
This command will do various sanity checks on the installation and especially the configuration.
Orphaned Queues¶
The intelmqctl check tool can search for orphaned queues. “Orphaned queues” are queues that have been used in the past and are no longer in use. For example you had a bot which you removed or renamed afterwards, but there were still messages in it’s source queue. The source queue won’t be renamed automatically and is now disconnected. As this queue is no longer configured, it won’t show up in the list of IntelMQ’s queues too. In case you are using redis as message broker, you can use the redis-cli tool to examine or remove these queues:
redis-cli -n 2
keys * # lists all existing non-empty queues
llen [queue-name] # shows the length of the queue [queue-name]
lindex [queue-name] [index] # show the [index]'s message of the queue [queue-name]
del [queue-name] # remove the queue [queue-name]
To ignore certain queues in this check, you can set the parameter intelmqctl_check_orphaned_queues_ignore in the defaults configuration file. For example:
"intelmqctl_check_orphaned_queues_ignore": ["Taichung-Parser"],
Configuration upgrade¶
The intelmqctl upgrade-config function upgrade, upgrade the configuration from previous versions to the current one. It keeps track of previously installed versions and the result of all “upgrade functions” in the “state file”, locate in the $var_state_path/state.json (/opt/intelmq/var/lib/state.json or /var/lib/intelmq/state.json).
This function has been introduced in version 2.0.1.
It makes backups itself for all changed files before every run. Backups are overridden if they already exists. So make sure to always have a backup of your configuration just in case.
Exit code¶
In case of errors, unsuccessful operations, the exit code is higher than 0. For example, when running intelmqctl start and one enabled bot is not running, the exit code is 1. The same is valid for e.g. intelmqctl status, which can be used for monitoring, and all other operations.
Known issues¶
The currently implemented process managing using PID files is very erroneous.
Data Feeds¶
The available feeds are grouped by the provider of the feeds. For each feed the collector and parser that can be used is documented as well as any feed-specific parameters. To add feeds to this file add them to intelmq/etc/feeds.yaml and then rebuild the documentation.
Contents
Abuse.ch¶
Feodo Tracker¶
Public: yes
Revision: 2022-11-15
Documentation: https://feodotracker.abuse.ch/
Description: List of botnet Command & Control servers (C&Cs) tracked by Feodo Tracker, associated with Dridex and Emotet (aka Heodo).
Additional Information: https://feodotracker.abuse.ch/ The data in the column Last Online is used for time.source if available, with 00:00 as time. Otherwise first seen is used as time.source.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://feodotracker.abuse.ch/downloads/ipblocklist.json
name
:Feodo Tracker
provider
:Abuse.ch
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.abusech.parser_feodotracker
Configuration Parameters:
URLhaus¶
Public: yes
Revision: 2020-07-07
Documentation: https://urlhaus.abuse.ch/feeds/
Description: URLhaus is a project from abuse.ch with the goal of sharing malicious URLs that are being used for malware distribution. URLhaus offers a country, ASN (AS number) and Top Level Domain (TLD) feed for network operators / Internet Service Providers (ISPs), Computer Emergency Response Teams (CERTs) and domain registries.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://urlhaus.abuse.ch/feeds/tld/<TLD>/, https://urlhaus.abuse.ch/feeds/country/<CC>/, or https://urlhaus.abuse.ch/feeds/asn/<ASN>/
name
:URLhaus
provider
:Abuse.ch
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.generic.parser_csv
- Configuration Parameters:
columns
:["time.source", "source.url", "status", "classification.type|__IGNORE__", "source.fqdn|__IGNORE__", "source.ip", "source.asn", "source.geolocation.cc"]
default_url_protocol
:http://
delimiter
:,
skip_header
:False
type_translation
:{"malware_download": "malware-distribution"}
AlienVault¶
OTX¶
Public: no
Revision: 2018-01-20
Documentation: https://otx.alienvault.com/
Description: AlienVault OTX Collector is the bot responsible to get the report through the API. Report could vary according to subscriptions.
Collector
Module: intelmq.bots.collectors.alienvault_otx.collector
- Configuration Parameters:
api_key
:{{ your API key }}
name
:OTX
provider
:AlienVault
Parser
Module: intelmq.bots.parsers.alienvault.parser_otx
Configuration Parameters:
Reputation List¶
Public: yes
Revision: 2018-01-20
Description: List of malicious IPs.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://reputation.alienvault.com/reputation.data
name
:Reputation List
provider
:AlienVault
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.alienvault.parser
Configuration Parameters:
AnubisNetworks¶
Cyberfeed Stream¶
Public: no
Revision: 2020-06-15
Documentation: https://www.anubisnetworks.com/ https://www.bitsight.com/
Description: Fetches and parsers the Cyberfeed data stream.
Collector
Module: intelmq.bots.collectors.http.collector_http_stream
- Configuration Parameters:
http_url
:https://prod.cyberfeed.net/stream?key={{ your API key }}
name
:Cyberfeed Stream
provider
:AnubisNetworks
strip_lines
:true
Parser
Module: intelmq.bots.parsers.anubisnetworks.parser
- Configuration Parameters:
use_malware_familiy_as_classification_identifier
:True
Bambenek¶
C2 Domains¶
Public: no
Revision: 2018-01-20
Documentation: https://osint.bambenekconsulting.com/feeds/
Description: Master Feed of known, active and non-sinkholed C&Cs domain names. Requires access credentials.
Additional Information: License: https://osint.bambenekconsulting.com/license.txt
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_password
:__PASSWORD__
http_url
:https://faf.bambenekconsulting.com/feeds/c2-dommasterlist.txt
http_username
:__USERNAME__
name
:C2 Domains
provider
:Bambenek
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.bambenek.parser
Configuration Parameters:
C2 IPs¶
Public: no
Revision: 2018-01-20
Documentation: https://osint.bambenekconsulting.com/feeds/
Description: Master Feed of known, active and non-sinkholed C&Cs IP addresses. Requires access credentials.
Additional Information: License: https://osint.bambenekconsulting.com/license.txt
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_password
:__PASSWORD__
http_url
:https://faf.bambenekconsulting.com/feeds/c2-ipmasterlist.txt
http_username
:__USERNAME__
name
:C2 IPs
provider
:Bambenek
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.bambenek.parser
Configuration Parameters:
DGA Domains¶
Public: yes
Revision: 2018-01-20
Documentation: https://osint.bambenekconsulting.com/feeds/
Description: Domain feed of known DGA domains from -2 to +3 days
Additional Information: License: https://osint.bambenekconsulting.com/license.txt
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://faf.bambenekconsulting.com/feeds/dga-feed.txt
name
:DGA Domains
provider
:Bambenek
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.bambenek.parser
Configuration Parameters:
Benkow¶
Malware Panels Tracker¶
Public: yes
Revision: 2022-11-16
Description: Benkow Panels tracker is a list of fresh panel from various malware. The feed is available on the webpage: http://benkow.cc/passwords.php
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:http://benkow.cc/export.php
name
:Malware Panels Tracker
provider
:Benkow
Parser
Module: intelmq.bots.parsers.generic.parser_csv
- Configuration Parameters:
columns
:["__IGNORE__", "malware.name", "source.url", "source.fqdn|source.ip", "time.source"]
columns_required
:[false, true, true, false, true]
defaults_fields
:{'classification.type': 'c2-server'}
delimiter
:;
skip_header
:True
Blocklist.de¶
Apache¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE Apache Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours as having run attacks on the service Apache, Apache-DDOS, RFI-Attacks.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/apache.txt
name
:Apache
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
Bots¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE Bots Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours as having run attacks attacks on the RFI-Attacks, REG-Bots, IRC-Bots or BadBots (BadBots = he has posted a Spam-Comment on a open Forum or Wiki).
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/bots.txt
name
:Bots
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
Brute-force Logins¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE Brute-force Login Collector is the bot responsible to get the report from source of information. All IPs which attacks Joomlas, Wordpress and other Web-Logins with Brute-Force Logins.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/bruteforcelogin.txt
name
:Brute-force Logins
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
FTP¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE FTP Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours for attacks on the Service FTP.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/ftp.txt
name
:FTP
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
IMAP¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE IMAP Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours for attacks on the service like IMAP, SASL, POP3, etc.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/imap.txt
name
:IMAP
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
IRC Bots¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: No description provided by feed provider.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/ircbot.txt
name
:IRC Bots
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
Mail¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE Mail Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours as having run attacks on the service Mail, Postfix.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/mail.txt
name
:Mail
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
SIP¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE SIP Collector is the bot responsible to get the report from source of information. All IP addresses that tried to login in a SIP-, VOIP- or Asterisk-Server and are included in the IPs-List from http://www.infiltrated.net/ (Twitter).
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/sip.txt
name
:SIP
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
SSH¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE SSH Collector is the bot responsible to get the report from source of information. All IP addresses which have been reported within the last 48 hours as having run attacks on the service SSH.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/ssh.txt
name
:SSH
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
Strong IPs¶
Public: yes
Revision: 2018-01-20
Documentation: http://www.blocklist.de/en/export.html
Description: Blocklist.DE Strong IPs Collector is the bot responsible to get the report from source of information. All IPs which are older then 2 month and have more then 5.000 attacks.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.blocklist.de/lists/strongips.txt
name
:Strong IPs
provider
:Blocklist.de
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.blocklistde.parser
Configuration Parameters:
Blueliv¶
CrimeServer¶
Public: no
Revision: 2018-01-20
Documentation: https://www.blueliv.com/
Description: Blueliv Crimeserver Collector is the bot responsible to get the report through the API.
Additional Information: The service uses a different API for free users and paying subscribers. In ‘CrimeServer’ feed the difference lies in the data points present in the feed. The non-free API available from Blueliv contains, for this specific feed, following extra fields not present in the free API; “_id” - Internal unique ID “subType” - Subtype of the Crime Server “countryName” - Country name where the Crime Server is located, in English “city” - City where the Crime Server is located “domain” - Domain of the Crime Server “host” - Host of the Crime Server “createdAt” - Date when the Crime Server was added to Blueliv CrimeServer database “asnCidr” - Range of IPs that belong to an ISP (registered via Autonomous System Number (ASN)) “asnId” - Identifier of an ISP registered via ASN “asnDesc” Description of the ISP registered via ASN
Collector
Module: intelmq.bots.collectors.blueliv.collector_crimeserver
- Configuration Parameters:
api_key
:__APIKEY__
name
:CrimeServer
provider
:Blueliv
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.blueliv.parser_crimeserver
Configuration Parameters:
CERT-Bund¶
CB-Report Malware infections via IMAP¶
Public: no
Revision: 2020-08-20
Description: CERT-Bund sends reports for the malware-infected hosts.
Additional Information: Traffic from malware related hosts contacting command-and-control servers is caught and sent to national CERT teams. There are two e-mail feeds with identical CSV structure – one reports on general malware infections, the other on the Avalanche botnet.
Collector
Module: intelmq.bots.collectors.mail.collector_mail_attach
- Configuration Parameters:
attach_regex
:events.csv
extract_files
:False
folder
:INBOX
mail_host
:__HOST__
mail_password
:__PASSWORD__
mail_ssl
:True
mail_user
:__USERNAME__
name
:CB-Report Malware infections via IMAP
provider
:CERT-Bund
rate_limit
:86400
subject_regex
:^\\[CB-Report#.* Malware infections (\\(Avalanche\\) )?in country
Parser
Module: intelmq.bots.parsers.generic.parser_csv
- Configuration Parameters:
columns
:["source.asn", "source.ip", "time.source", "classification.type", "malware.name", "source.port", "destination.ip", "destination.port", "destination.fqdn", "protocol.transport"]
default_url_protocol
:http://
defaults_fields
:{'classification.type': 'infected-system'}
delimiter
:,
skip_header
:True
time_format
:from_format|%Y-%m-%d %H:%M:%S
CERT.PL¶
N6 Stomp Stream¶
Public: no
Revision: 2018-01-20
Documentation: https://n6.cert.pl/en/
Description: N6 Collector - CERT.pl’s N6 Collector - N6 feed via STOMP interface. Note that rate_limit does not apply for this bot as it is waiting for messages on a stream.
Additional Information: Contact cert.pl to get access to the feed.
Collector
Module: intelmq.bots.collectors.stomp.collector
- Configuration Parameters:
exchange
:{insert your exchange point as given by CERT.pl}
name
:N6 Stomp Stream
port
:61614
provider
:CERT.PL
server
:n6stream.cert.pl
ssl_ca_certificate
:{insert path to CA file for CERT.pl's n6}
ssl_client_certificate
:{insert path to client cert file for CERTpl's n6}
ssl_client_certificate_key
:{insert path to client cert key file for CERT.pl's n6}
Parser
Module: intelmq.bots.parsers.n6.parser_n6stomp
Configuration Parameters:
CINS Army¶
CINS Army List¶
Public: yes
Revision: 2018-01-20
Documentation: https://cinsscore.com/#list
Description: The CINS Army (CIArmy.com) list is a subset of the CINS Active Threat Intelligence ruleset, and consists of IP addresses that meet one of two basic criteria: 1) The IP’s recent Rogue Packet score factor is very poor, or 2) The IP has tripped a designated number of ‘trusted’ alerts across a given number of our Sentinels deployed around the world.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:http://cinsscore.com/list/ci-badguys.txt
name
:CINS Army List
provider
:CINS Army
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.ci_army.parser
Configuration Parameters:
CZ.NIC¶
HaaS¶
Public: yes
Revision: 2020-07-22
Documentation: https://haas.nic.cz/
Description: SSH attackers against HaaS (Honeypot as a Service) provided by CZ.NIC, z.s.p.o. The dump is published once a day.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
extract_files
:True
http_url
:https://haas.nic.cz/stats/export/{time[%Y/%m/%Y-%m-%d]}.json.gz
http_url_formatting
:{'days': -1}
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.cznic.parser_haas
Configuration Parameters:
Proki¶
Public: no
Revision: 2020-08-17
Documentation: https://csirt.cz/en/proki/
Description: Aggregation of various sources on malicious IP addresses (malware spreaders or C&C servers).
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://proki.csirt.cz/api/1/__APIKEY__/data/day/{time[%Y/%m/%d]}
http_url_formatting
:{'days': -1}
name
:Proki
provider
:CZ.NIC
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.cznic.parser_proki
Configuration Parameters:
Calidog¶
CertStream¶
Public: yes
Revision: 2018-06-15
Documentation: https://medium.com/cali-dog-security/introducing-certstream-3fc13bb98067
Description: HTTP Websocket Stream from certstream.calidog.io providing data from Certificate Transparency Logs.
Additional Information: Be aware that this feed provides a lot of data and may overload your system quickly.
Collector
Module: intelmq.bots.collectors.calidog.collector_certstream
- Configuration Parameters:
name
:CertStream
provider
:Calidog
Parser
Module: intelmq.bots.parsers.calidog.parser_certstream
Configuration Parameters:
CleanMX¶
Phishing¶
Public: no
Revision: 2018-01-20
Documentation: http://clean-mx.de/
Description: In order to download the CleanMX feed you need to use a custom user agent and register that user agent.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_timeout_sec
:120
http_url
:http://support.clean-mx.de/clean-mx/xmlphishing?response=alive&domain=
http_user_agent
:{{ your user agent }}
name
:Phishing
provider
:CleanMX
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.cleanmx.parser
Configuration Parameters:
Virus¶
Public: no
Revision: 2018-01-20
Documentation: http://clean-mx.de/
Description: In order to download the CleanMX feed you need to use a custom user agent and register that user agent.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_timeout_sec
:120
http_url
:http://support.clean-mx.de/clean-mx/xmlviruses?response=alive&domain=
http_user_agent
:{{ your user agent }}
name
:Virus
provider
:CleanMX
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.cleanmx.parser
Configuration Parameters:
CyberCrime Tracker¶
Latest¶
Public: yes
Revision: 2019-03-19
Documentation: https://cybercrime-tracker.net/index.php
Description: C2 servers
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://cybercrime-tracker.net/index.php
name
:Latest
provider
:CyberCrime Tracker
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.html_table.parser
- Configuration Parameters:
columns
:["time.source", "source.url", "source.ip", "malware.name", "__IGNORE__"]
default_url_protocol
:http://
defaults_fields
:{'classification.type': 'c2-server'}
skip_table_head
:True
Danger Rulez¶
Bruteforce Blocker¶
Public: yes
Revision: 2018-01-20
Documentation: http://danger.rulez.sk/index.php/bruteforceblocker/
Description: Its main purpose is to block SSH bruteforce attacks via firewall.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:http://danger.rulez.sk/projects/bruteforceblocker/blist.php
name
:Bruteforce Blocker
provider
:Danger Rulez
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.danger_rulez.parser
Configuration Parameters:
Dataplane¶
DNS Recursion Desired¶
Public: yes
Revision: 2021-09-09
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a source IP address that has been seen performing a DNS recursion desired query to a remote host. This report lists hosts that are suspicious of more than just port scanning. The host may be DNS server cataloging or searching for hosts to use for DNS-based DDoS amplification.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/dnsrd.txt
name
:DNS Recursion Desired
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
DNS Recursion Desired ANY¶
Public: yes
Revision: 2021-09-09
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a source IP address that has been seen performing a DNS recursion desired IN ANY query to a remote host. This report lists hosts that are suspicious of more than just port scanning. The host may be DNS server cataloging or searching for hosts to use for DNS-based DDoS amplification.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/dnsrdany.txt
name
:DNS Recursion Desired ANY
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
DNS Version¶
Public: yes
Revision: 2021-09-09
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a source IP address that has been seen performing a DNS CH TXT version.bind query to a remote host. This report lists hosts that are suspicious of more than just port scanning. The host may be DNS server cataloging or searching for vulnerable DNS servers.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/dnsversion.txt
name
:DNS Version
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
Protocol 41¶
Public: yes
Revision: 2021-09-09
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a host that has been detected to offer open IPv6 over IPv4 tunneling. This could allow for the host to be used a public proxy against IPv6 hosts.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/proto41.txt
name
:Protocol 41
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
SIP Query¶
Public: yes
Revision: 2018-01-20
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a source IP address that has been seen initiating a SIP OPTIONS query to a remote host. This report lists hosts that are suspicious of more than just port scanning. The hosts may be SIP server cataloging or conducting various forms of telephony abuse. Report is updated hourly.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/sipquery.txt
name
:SIP Query
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
SIP Registration¶
Public: yes
Revision: 2018-01-20
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a source IP address that has been seen initiating a SIP REGISTER operation to a remote host. This report lists hosts that are suspicious of more than just port scanning. The hosts may be SIP client cataloging or conducting various forms of telephony abuse. Report is updated hourly.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/sipregistration.txt
name
:SIP Registration
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
SMTP Data¶
Public: yes
Revision: 2021-09-09
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a host that has been seen initiating a SMTP DATA operation to a remote host. The source report lists hosts that are suspicious of more than just port scanning. The host may be SMTP server cataloging or conducting various forms of email abuse.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/smtpdata.txt
name
:SMTP Data
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
SMTP Greet¶
Public: yes
Revision: 2021-09-09
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a host that has been seen initiating a SMTP HELO/EHLO operation to a remote host. The source report lists hosts that are suspicious of more than just port scanning. The host may be SMTP server cataloging or conducting various forms of email abuse.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/smtpgreet.txt
name
:SMTP Greet
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
SSH Client Connection¶
Public: yes
Revision: 2018-01-20
Documentation: https://dataplane.org/
Description: Entries below consist of fields with identifying characteristics of a source IP address that has been seen initiating an SSH connection to a remote host. This report lists hosts that are suspicious of more than just port scanning. The hosts may be SSH server cataloging or conducting authentication attack attempts. Report is updated hourly.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/sshclient.txt
name
:SSH Client Connection
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
SSH Password Authentication¶
Public: yes
Revision: 2018-01-20
Documentation: https://dataplane.org/
Description: Entries below consist of fields with identifying characteristics of a source IP address that has been seen attempting to remotely login to a host using SSH password authentication. The report lists hosts that are highly suspicious and are likely conducting malicious SSH password authentication attacks. Report is updated hourly.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/sshpwauth.txt
name
:SSH Password Authentication
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
Telnet Login¶
Public: yes
Revision: 2021-09-09
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a host that has been seen initiating a telnet connection to a remote host. The source report lists hosts that are suspicious of more than just port scanning. The host may be telnet server cataloging or conducting authentication attack attempts.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/telnetlogin.txt
name
:Telnet Login
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
VNC/RFB Login¶
Public: yes
Revision: 2021-09-09
Documentation: https://dataplane.org/
Description: Entries consist of fields with identifying characteristics of a host that has been seen initiating a VNC remote buffer session to a remote host. The source report lists hosts that are suspicious of more than just port scanning. The host may be VNC/RFB server cataloging or conducting authentication attack attempts.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dataplane.org/vncrfb.txt
name
:VNC/RFB Login
provider
:Dataplane
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.dataplane.parser
Configuration Parameters:
ESET¶
ETI Domains¶
Public: no
Revision: 2020-06-30
Documentation: https://www.eset.com/int/business/services/threat-intelligence/
Description: Domain data from ESET’s TAXII API.
Collector
Module: intelmq.bots.collectors.eset.collector
- Configuration Parameters:
collection
:ei.domains v2 (json)
endpoint
:eti.eset.com
password
:<password>
time_delta
:3600
username
:<username>
Parser
Module: intelmq.bots.parsers.eset.parser
Configuration Parameters:
ETI URLs¶
Public: no
Revision: 2020-06-30
Documentation: https://www.eset.com/int/business/services/threat-intelligence/
Description: URL data from ESET’s TAXII API.
Collector
Module: intelmq.bots.collectors.eset.collector
- Configuration Parameters:
collection
:ei.urls (json)
endpoint
:eti.eset.com
password
:<password>
time_delta
:3600
username
:<username>
Parser
Module: intelmq.bots.parsers.eset.parser
Configuration Parameters:
Fireeye¶
Malware Analysis System¶
Public: no
Revision: 2021-05-03
Documentation: https://www.fireeye.com/products/malware-analysis.html
Description: Process data from Fireeye mail and file analysis appliances. SHA1 and MD5 malware hashes are extracted and if there is network communication, also URLs and domains.
Collector
Module: intelmq.bots.collectors.fireeye.collector_mas
- Configuration Parameters:
host
:<hostname of your appliance>
http_password
:<your password>
http_username
:<your username>
request_duration
:<how old date should be fetched eg 24_hours or 48_hours>
Parser
Module: intelmq.bots.parsers.fireeye.parser
Configuration Parameters:
Fraunhofer¶
DGA Archive¶
Public: no
Revision: 2018-01-20
Documentation: https://dgarchive.caad.fkie.fraunhofer.de/welcome/
Description: Fraunhofer DGA collector fetches data from Fraunhofer’s domain generation archive.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_password
:{{ your password}}
http_url
:https://dgarchive.caad.fkie.fraunhofer.de/today
http_username
:{{ your username}}
name
:DGA Archive
provider
:Fraunhofer
rate_limit
:10800
Parser
Module: intelmq.bots.parsers.fraunhofer.parser_dga
Configuration Parameters:
Have I Been Pwned¶
Enterprise Callback¶
Public: no
Revision: 2019-09-11
Documentation: https://haveibeenpwned.com/EnterpriseSubscriber/
Description: With the Enterprise Subscription of ‘Have I Been Pwned’ you are able to provide a callback URL and any new leak data is submitted to it. It is recommended to put a webserver with Authorization check, TLS etc. in front of the API collector.
- Additional Information: A minimal nginx configuration could look like:
server { listen 443 ssl http2; server_name [your host name]; client_max_body_size 50M; ssl_certificate [path to your key]; ssl_certificate_key [path to your certificate]; location /[your private url] { if ($http_authorization != '[your private password]') { return 403; } proxy_pass http://localhost:5001/intelmq/push; proxy_read_timeout 30; proxy_connect_timeout 30; } }
Collector
Module: intelmq.bots.collectors.api.collector_api
- Configuration Parameters:
name
:Enterprise Callback
port
:5001
provider
:Have I Been Pwned
Parser
Module: intelmq.bots.parsers.hibp.parser_callback
Configuration Parameters:
MalwarePatrol¶
DansGuardian¶
Public: no
Revision: 2018-01-20
Documentation: https://www.malwarepatrol.net/non-commercial/
Description: Malware block list with URLs
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://lists.malwarepatrol.net/cgi/getfile?receipt={{ your API key }}&product=8&list=dansguardian
name
:DansGuardian
provider
:MalwarePatrol
rate_limit
:180000
Parser
Module: intelmq.bots.parsers.malwarepatrol.parser_dansguardian
Configuration Parameters:
MalwareURL¶
Latest malicious activity¶
Public: yes
Revision: 2018-02-05
Documentation: https://www.malwareurl.com/
Description: Latest malicious domains/IPs.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.malwareurl.com/
name
:Latest malicious activity
provider
:MalwareURL
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.malwareurl.parser
Configuration Parameters:
McAfee Advanced Threat Defense¶
Sandbox Reports¶
Public: no
Revision: 2018-07-05
Documentation: https://www.mcafee.com/enterprise/en-us/products/advanced-threat-defense.html
Description: Processes reports from McAfee’s sandboxing solution via the openDXL API.
Collector
Module: intelmq.bots.collectors.opendxl.collector
- Configuration Parameters:
dxl_config_file
:{{location of dxl configuration file}}
dxl_topic
:/mcafee/event/atd/file/report
Parser
Module: intelmq.bots.parsers.mcafee.parser_atd
- Configuration Parameters:
verdict_severity
:4
Microsoft¶
BingMURLs via Interflow¶
Public: no
Revision: 2018-05-29
Documentation: https://docs.microsoft.com/en-us/security/gsp/informationsharingandexchange
Description: Collects Malicious URLs detected by Bing from the Interflow API. The feed is available via Microsoft’s Government Security Program (GSP).
Additional Information: Depending on the file sizes you may need to increase the parameter ‘http_timeout_sec’ of the collector.
Collector
Module: intelmq.bots.collectors.microsoft.collector_interflow
- Configuration Parameters:
api_key
:{{your API key}}
file_match
:^bingmurls_
http_timeout_sec
:300
name
:BingMURLs via Interflow
not_older_than
:2 days
provider
:Microsoft
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.microsoft.parser_bingmurls
Configuration Parameters:
CTIP C2 via Azure¶
Public: no
Revision: 2020-05-29
Documentation: https://docs.microsoft.com/en-us/security/gsp/informationsharingandexchange
Description: Collects the CTIP C2 feed from a shared Azure Storage. The feed is available via Microsoft’s Government Security Program (GSP).
Additional Information: The cache is needed for memorizing which files have already been processed, the TTL should be higher than the oldest file available in the storage (currently the last three days are available). The connection string contains endpoint as well as authentication information.
Collector
Module: intelmq.bots.collectors.microsoft.collector_azure
- Configuration Parameters:
connection_string
:{{your connection string}}
container_name
:ctip-c2
name
:CTIP C2 via Azure
provider
:Microsoft
rate_limit
:3600
redis_cache_db
:5
redis_cache_host
:127.0.0.1
redis_cache_port
:6379
redis_cache_ttl
:864000
Parser
Module: intelmq.bots.parsers.microsoft.parser_ctip
Configuration Parameters:
CTIP Infected via Azure¶
Public: no
Revision: 2022-06-01
Documentation: https://docs.microsoft.com/en-us/security/gsp/informationsharingandexchange http://www.dcuctip.com/
Description: Collects the CTIP (Sinkhole data) from a shared Azure Storage. The feed is available via Microsoft’s Government Security Program (GSP).
Additional Information: The cache is needed for memorizing which files have already been processed, the TTL should be higher than the oldest file available in the storage (currently the last three days are available). The connection string contains endpoint as well as authentication information. As many IPs occur very often in the data, you may want to use a deduplicator specifically for the feed. More information about the feed can be found on www.dcuctip.com after login with your GSP account.
Collector
Module: intelmq.bots.collectors.microsoft.collector_azure
- Configuration Parameters:
connection_string
:{{your connection string}}
container_name
:ctip-infected-summary
name
:CTIP Infected via Azure
provider
:Microsoft
rate_limit
:3600
redis_cache_db
:5
redis_cache_host
:127.0.0.1
redis_cache_port
:6379
redis_cache_ttl
:864000
Parser
Module: intelmq.bots.parsers.microsoft.parser_ctip
Configuration Parameters:
CTIP Infected via Interflow¶
Public: no
Revision: 2018-03-06
Documentation: https://docs.microsoft.com/en-us/security/gsp/informationsharingandexchange http://www.dcuctip.com/
Description: Collects the CTIP Infected feed (Sinkhole data for your country) files from the Interflow API.The feed is available via Microsoft’s Government Security Program (GSP).
Additional Information: Depending on the file sizes you may need to increase the parameter ‘http_timeout_sec’ of the collector. As many IPs occur very often in the data, you may want to use a deduplicator specifically for the feed. More information about the feed can be found on www.dcuctip.com after login with your GSP account.
Collector
Module: intelmq.bots.collectors.microsoft.collector_interflow
- Configuration Parameters:
api_key
:{{your API key}}
file_match
:^ctip_
http_timeout_sec
:300
name
:CTIP Infected via Interflow
not_older_than
:2 days
provider
:Microsoft
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.microsoft.parser_ctip
Configuration Parameters:
Netlab 360¶
DGA¶
Public: yes
Revision: 2018-01-20
Documentation: http://data.netlab.360.com/dga
Description: This feed lists DGA family, Domain, Start and end of valid time(UTC) of a number of DGA families.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:http://data.netlab.360.com/feeds/dga/dga.txt
name
:DGA
provider
:Netlab 360
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.netlab_360.parser
Configuration Parameters:
Hajime Scanner¶
Public: yes
Revision: 2019-08-01
Documentation: https://data.netlab.360.com/hajime/
Description: This feed lists IP address for know Hajime bots network. These IPs data are obtained by joining the DHT network and interacting with the Hajime node
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://data.netlab.360.com/feeds/hajime-scanner/bot.list
name
:Hajime Scanner
provider
:Netlab 360
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.netlab_360.parser
Configuration Parameters:
Magnitude EK¶
Public: yes
Revision: 2018-01-20
Documentation: http://data.netlab.360.com/ek
Description: This feed lists FQDN and possibly the URL used by Magnitude Exploit Kit. Information also includes the IP address used for the domain and last time seen.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:http://data.netlab.360.com/feeds/ek/magnitude.txt
name
:Magnitude EK
provider
:Netlab 360
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.netlab_360.parser
Configuration Parameters:
OpenPhish¶
Public feed¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.openphish.com/
Description: OpenPhish is a fully automated self-contained platform for phishing intelligence. It identifies phishing sites and performs intelligence analysis in real time without human intervention and without using any external resources, such as blacklists.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.openphish.com/feed.txt
name
:Public feed
provider
:OpenPhish
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.openphish.parser
Configuration Parameters:
PhishTank¶
Online¶
Public: no
Revision: 2022-11-21
Documentation: https://www.phishtank.com/developer_info.php
Description: PhishTank is a collaborative clearing house for data and information about phishing on the Internet.
Additional Information: Updated hourly as per the documentation. Download is possible without API key, but limited to few downloads per day.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
extract_files
:True
http_url
:https://data.phishtank.com/data/{{ your API key }}/online-valid.json.gz
name
:Online
provider
:PhishTank
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.phishtank.parser
Configuration Parameters:
PrecisionSec¶
Agent Tesla¶
Public: yes
Revision: 2019-04-02
Documentation: https://precisionsec.com/threat-intelligence-feeds/agent-tesla/
Description: Agent Tesla IoCs, URLs where the malware is hosted.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://precisionsec.com/threat-intelligence-feeds/agent-tesla/
name
:Agent Tesla
provider
:PrecisionSec
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.html_table.parser
- Configuration Parameters:
columns
:["source.ip|source.url", "time.source"]
default_url_protocol
:http://
defaults_fields
:{'classification.type': 'malware-distribution'}
skip_table_head
:True
Shadowserver¶
Via API¶
Public: no
Revision: 2020-01-08
Documentation: https://www.shadowserver.org/what-we-do/network-reporting/api-documentation/
Description: Shadowserver sends out a variety of reports to subscribers, see documentation.
Additional Information: This configuration fetches user-configurable reports from the Shadowserver Reports API. For a list of reports, have a look at the Shadowserver collector and parser documentation.
Collector
Module: intelmq.bots.collectors.shadowserver.collector_reports_api
- Configuration Parameters:
api_key
:<API key>
country
:<CC>
rate_limit
:86400
redis_cache_db
:12
redis_cache_host
:127.0.0.1
redis_cache_port
:6379
redis_cache_ttl
:864000
secret
:<API secret>
types
:<single report or list of reports>
Parser
Module: intelmq.bots.parsers.shadowserver.parser_json
Configuration Parameters:
Via IMAP¶
Public: no
Revision: 2018-01-20
Documentation: https://www.shadowserver.org/what-we-do/network-reporting/
Description: Shadowserver sends out a variety of reports (see https://www.shadowserver.org/wiki/pmwiki.php/Services/Reports).
Additional Information: The configuration retrieves the data from a e-mails via IMAP from the attachments.
Collector
Module: intelmq.bots.collectors.mail.collector_mail_attach
- Configuration Parameters:
attach_regex
:csv.zip
extract_files
:True
folder
:INBOX
mail_host
:__HOST__
mail_password
:__PASSWORD__
mail_ssl
:True
mail_user
:__USERNAME__
name
:Via IMAP
provider
:Shadowserver
rate_limit
:86400
subject_regex
:__REGEX__
Parser
Module: intelmq.bots.parsers.shadowserver.parser
Configuration Parameters:
Via Request Tracker¶
Public: no
Revision: 2018-01-20
Documentation: https://www.shadowserver.org/what-we-do/network-reporting/
Description: Shadowserver sends out a variety of reports (see https://www.shadowserver.org/wiki/pmwiki.php/Services/Reports).
Additional Information: The configuration retrieves the data from a RT/RTIR ticketing instance via the attachment or an download.
Collector
Module: intelmq.bots.collectors.rt.collector_rt
- Configuration Parameters:
attachment_regex
:\\.csv\\.zip$
extract_attachment
:True
extract_download
:False
http_password
:{{ your HTTP Authentication password or null }}
http_username
:{{ your HTTP Authentication username or null }}
password
:__PASSWORD__
provider
:Shadowserver
rate_limit
:3600
search_not_older_than
:{{ relative time or null }}
search_owner
:nobody
search_queue
:Incident Reports
search_requestor
:autoreports@shadowserver.org
search_status
:new
search_subject_like
:\[__COUNTRY__\] Shadowserver __COUNTRY__
set_status
:open
take_ticket
:True
uri
:http://localhost/rt/REST/1.0
url_regex
:https://dl.shadowserver.org/[a-zA-Z0-9?_-]*
user
:__USERNAME__
Parser
Module: intelmq.bots.parsers.shadowserver.parser
Configuration Parameters:
Shodan¶
Country Stream¶
Public: no
Revision: 2021-03-22
Documentation: https://developer.shodan.io/api/stream
Description: Collects the Shodan stream for one or multiple countries from the Shodan API.
Additional Information: A Shodan account with streaming permissions is needed.
Collector
Module: intelmq.bots.collectors.shodan.collector_stream
- Configuration Parameters:
api_key
:<API key>
countries
:<comma-separated list of country codes>
error_retry_delay
:0
name
:Country Stream
provider
:Shodan
Parser
Module: intelmq.bots.parsers.shodan.parser
- Configuration Parameters:
error_retry_delay
:0
ignore_errors
:False
minimal_mode
:False
Spamhaus¶
ASN Drop¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.spamhaus.org/drop/
Description: ASN-DROP contains a list of Autonomous System Numbers controlled by spammers or cyber criminals, as well as “hijacked” ASNs. ASN-DROP can be used to filter BGP routes which are being used for malicious purposes.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.spamhaus.org/drop/asndrop.txt
name
:ASN Drop
provider
:Spamhaus
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.spamhaus.parser_drop
Configuration Parameters:
CERT¶
Public: no
Revision: 2018-01-20
Documentation: https://www.spamhaus.org/news/article/705/spamhaus-launches-cert-insight-portal
Description: Spamhaus CERT Insight Portal. Access limited to CERTs and CSIRTs with national or regional responsibility. .
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:{{ your CERT portal URL }}
name
:CERT
provider
:Spamhaus
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.spamhaus.parser_cert
Configuration Parameters:
Drop¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.spamhaus.org/drop/
Description: The DROP list will not include any IP address space under the control of any legitimate network - even if being used by “the spammers from hell”. DROP will only include netblocks allocated directly by an established Regional Internet Registry (RIR) or National Internet Registry (NIR) such as ARIN, RIPE, AFRINIC, APNIC, LACNIC or KRNIC or direct RIR allocations.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.spamhaus.org/drop/drop.txt
name
:Drop
provider
:Spamhaus
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.spamhaus.parser_drop
Configuration Parameters:
Dropv6¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.spamhaus.org/drop/
Description: The DROPv6 list includes IPv6 ranges allocated to spammers or cyber criminals. DROPv6 will only include IPv6 netblocks allocated directly by an established Regional Internet Registry (RIR) or National Internet Registry (NIR) such as ARIN, RIPE, AFRINIC, APNIC, LACNIC or KRNIC or direct RIR allocations.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.spamhaus.org/drop/dropv6.txt
name
:Dropv6
provider
:Spamhaus
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.spamhaus.parser_drop
Configuration Parameters:
EDrop¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.spamhaus.org/drop/
Description: EDROP is an extension of the DROP list that includes sub-allocated netblocks controlled by spammers or cyber criminals. EDROP is meant to be used in addition to the direct allocations on the DROP list.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.spamhaus.org/drop/edrop.txt
name
:EDrop
provider
:Spamhaus
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.spamhaus.parser_drop
Configuration Parameters:
Strangereal Intel¶
DailyIOC¶
Public: yes
Revision: 2019-12-05
Documentation: https://github.com/StrangerealIntel/DailyIOC
Description: Daily IOC from tweets and articles
Additional Information: collector’s extra_fields parameter may be any of fields from the github content API response
Collector
Module: intelmq.bots.collectors.github_api.collector_github_contents_api
- Configuration Parameters:
personal_access_token
:https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token
regex
:.*.json
repository
:StrangerealIntel/DailyIOC
Parser
Module: intelmq.bots.parsers.github_feed
Configuration Parameters:
Sucuri¶
Surbl¶
Malicious Domains¶
Public: no
Revision: 2018-09-04
Description: Detected malicious domains. Note that you have to opened up Sponsored Datafeed Service (SDS) access to the SURBL data via rsync for your IP address.
Collector
Module: intelmq.bots.collectors.rsync.collector_rsync
- Configuration Parameters:
file
:wild.surbl.org.rbldnsd
rsync_path
:blacksync.prolocation.net::surbl-wild/
Parser
Module: intelmq.bots.parsers.surbl.parser
Configuration Parameters:
Team Cymru¶
CAP¶
Public: no
Revision: 2018-01-20
Documentation: https://www.team-cymru.com/CSIRT-AP.html https://www.cymru.com/$certname/report_info.txt
Description: Team Cymru provides daily lists of compromised or abused devices for the ASNs and/or netblocks with a CSIRT’s jurisdiction. This includes such information as bot infected hosts, command and control systems, open resolvers, malware urls, phishing urls, and brute force attacks
Additional Information: “Two feeds types are offered:
The new https://www.cymru.com/$certname/$certname_{time[%Y%m%d]}.txt
and the old https://www.cymru.com/$certname/infected_{time[%Y%m%d]}.txt
Both formats are supported by the parser and the new one is recommended. As of 2019-09-12 the old format will be retired soon.”
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_password
:{{your password}}
http_url
:https://www.cymru.com/$certname/$certname_{time[%Y%m%d]}.txt
http_url_formatting
:True
http_username
:{{your login}}
name
:CAP
provider
:Team Cymru
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.cymru.parser_cap_program
Configuration Parameters:
Full Bogons IPv4¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.team-cymru.com/bogon-reference-http.html
Description: Fullbogons are a larger set which also includes IP space that has been allocated to an RIR, but not assigned by that RIR to an actual ISP or other end-user. IANA maintains a convenient IPv4 summary page listing allocated and reserved netblocks, and each RIR maintains a list of all prefixes that they have assigned to end-users. Our bogon reference pages include additional links and resources to assist those who wish to properly filter bogon prefixes within their networks.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.team-cymru.org/Services/Bogons/fullbogons-ipv4.txt
name
:Full Bogons IPv4
provider
:Team Cymru
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.cymru.parser_full_bogons
Configuration Parameters:
Full Bogons IPv6¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.team-cymru.com/bogon-reference-http.html
Description: Fullbogons are a larger set which also includes IP space that has been allocated to an RIR, but not assigned by that RIR to an actual ISP or other end-user. IANA maintains a convenient IPv4 summary page listing allocated and reserved netblocks, and each RIR maintains a list of all prefixes that they have assigned to end-users. Our bogon reference pages include additional links and resources to assist those who wish to properly filter bogon prefixes within their networks.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.team-cymru.org/Services/Bogons/fullbogons-ipv6.txt
name
:Full Bogons IPv6
provider
:Team Cymru
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.cymru.parser_full_bogons
Configuration Parameters:
Threatminer¶
Recent domains¶
Public: yes
Revision: 2018-02-06
Documentation: https://www.threatminer.org/
Description: Latest malicious domains.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.threatminer.org/
name
:Recent domains
provider
:Threatminer
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.threatminer.parser
Configuration Parameters:
Turris¶
Greylist¶
Public: yes
Revision: 2018-01-20
Documentation: https://project.turris.cz/en/greylist
Description: The data are processed and classified every week and behaviour of IP addresses that accessed a larger number of Turris routers is evaluated. The result is a list of addresses that have tried to obtain information about services on the router or tried to gain access to them. The list also contains a list of tags for each address which indicate what behaviour of the address was observed.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.turris.cz/greylist-data/greylist-latest.csv
name
:Greylist
provider
:Turris
rate_limit
:43200
Parser
Module: intelmq.bots.parsers.turris.parser
Configuration Parameters:
Greylist with PGP signature verification¶
Public: yes
Revision: 2018-01-20
Documentation: https://project.turris.cz/en/greylist
Description: The data are processed and classified every week and behaviour of
IP addresses that accessed a larger number of Turris routers is evaluated. The result is a list of addresses that have tried to obtain information about services on the router or tried to gain access to them. The list also contains a list of tags for each address which indicate what behaviour of the address was observed.
The Turris Greylist feed provides PGP signatures for the provided files. You will need to import the public PGP key from the linked documentation page, currently available at https://pgp.mit.edu/pks/lookup?op=vindex&search=0x10876666 or from below. See the URL Fetcher Collector documentation for more information on PGP signature verification.
PGP Public key:
-----BEGIN PGP PUBLIC KEY BLOCK-----
Version: SKS 1.1.6
Comment: Hostname: pgp.mit.edu
mQINBFRl7D8BEADaRFoDa/+r27Gtqrdn8sZL4aSYTU4Q3gDr3TfigK8H26Un/Y79a/DUL1o0
o8SRae3uwVcjJDHZ6KDnxThbqF7URfpuCcCYxOs8p/eu3dSueqEGTODHWF4ChIh2japJDc4t
3FQHbIh2e3GHotVqJGhvxMmWqBFoZ/mlWvhjs99FFBZ87qbUNk7l1UAGEXeWeECgz9nGox40
3YpCgEsnJJsKC53y5LD/wBf4z+z0GsLg2GMRejmPRgrkSE/d9VjF/+niifAj2ZVFoINSVjjI
8wQFc8qLiExdzwLdgc+ggdzk5scY3ugI5IBt1zflxMIOG4BxKj/5IWsnhKMG2NLVGUYOODoG
pKhcY0gCHypw1bmkp2m+BDVyg4KM2fFPgQ554DAX3xdukMCzzZyBxR3UdT4dN7xRVhpph3Y2
Amh1E/dpde9uwKFk1oRHkRZ3UT1XtpbXtFNY0wCiGXPt6KznJAJcomYFkeLHjJo3nMK0hISV
GSNetVLfNWlTkeo93E1innbSaDEN70H4jPivjdVjSrLtIGfr2IudUJI84dGmvMxssWuM2qdg
FSzoTHw9UE9KT3SltKPS+F7u9x3h1J492YaVDncATRjPZUBDhbvo6Pcezhup7XTnI3gbRQc2
oEUDb933nwuobHm3VsUcf9686v6j8TYehsbjk+zdA4BoS/IdCwARAQABtC5UdXJyaXMgR3Jl
eWxpc3QgR2VuZXJhdG9yIDxncmV5bGlzdEB0dXJyaXMuY3o+iQI4BBMBAgAiBQJUZew/AhsD
BgsJCAcDAgYVCAIJCgsEFgIDAQIeAQIXgAAKCRDAQrU3EIdmZoH4D/9Jo6j9RZxCAPTaQ9WZ
WOdb1Eqd/206bObEX+xJAago+8vuy+waatHYBM9/+yxh0SIg2g5whd6J7A++7ePpt5XzX6hq
bzdG8qGtsCRu+CpDJ40UwHep79Ck6O/A9KbZcZW1z/DhbYT3z/ZVWALy4RtgmyC67Vr+j/C7
KNQ529bs3kP9AzvEIeBC4wdKl8dUSuZIPFbgf565zRNKLtHVgVhiuDPcxKmBEl4/PLYF30a9
5Tgp8/PNa2qp1DV/EZjcsxvSRIZB3InGBvdKdSzvs4N/wLnKWedj1GGm7tJhSkJa4MLBSOIx
yamhTS/3A5Cd1qoDhLkp7DGVXSdgEtpoZDC0jR7nTS6pXojcgQaF7SfJ3cjZaLI5rjsx0YLk
G4PzonQKCAAQG1G9haCDniD8NrrkZ3eFiafoKEECRFETIG0BJHjPdSWcK9jtNCupBYb7JCiz
Q0hwLh2wrw/wCutQezD8XfsBFFIQC18TsJAVgdHLZnGYkd5dIbV/1scOcm52w6EGIeMBBYlB
J2+JNukH5sJDA6zAXNl2I1H1eZsP4+FSNIfB6LdovHVPAjn7qXCw3+IonnQK8+g8YJkbbhKJ
sPejfg+ndpe5u0zX+GvQCFBFu03muANA0Y/OOeGIQwU93d/akN0P1SRfq+bDXnkRIJQOD6XV
0ZPKVXlNOjy/z2iN2A==
=wjkM
-----END PGP PUBLIC KEY BLOCK-----
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.turris.cz/greylist-data/greylist-latest.csv
name
:Greylist
provider
:Turris
rate_limit
:43200
signature_url
:https://www.turris.cz/greylist-data/greylist-latest.csv.asc
verify_pgp_signatures
:True
Parser
Module: intelmq.bots.parsers.turris.parser
Configuration Parameters:
University of Toulouse¶
Blacklist¶
Public: yes
Revision: 2018-01-20
Documentation: https://dsi.ut-capitole.fr/blacklists/
Description: Various blacklist feeds
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
extract_files
:true
http_url
:https://dsi.ut-capitole.fr/blacklists/download/{collection name}.tar.gz
name
:Blacklist
provider
:University of Toulouse
rate_limit
:43200
Parser
Module: intelmq.bots.parsers.generic.parser_csv
- Configuration Parameters:
columns
:{depends on a collection}
defaults_fields
:{'classification.type': '{depends on a collection}'}
delimiter
:false
VXVault¶
URLs¶
Public: yes
Revision: 2018-01-20
Documentation: http://vxvault.net/ViriList.php
Description: This feed provides IP addresses hosting Malware.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:http://vxvault.net/URL_List.php
name
:URLs
provider
:VXVault
rate_limit
:3600
Parser
Module: intelmq.bots.parsers.vxvault.parser
Configuration Parameters:
ViriBack¶
C2 Tracker¶
Public: yes
Revision: 2022-11-15
Documentation: https://viriback.com/
Description: Latest detected C2 servers.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://tracker.viriback.com/dump.php
name
:C2 Tracker
provider
:ViriBack
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.generic.csv_parser
- Configuration Parameters:
columns
:["malware.name", "source.url", "source.ip", "time.source"]
defaults_fields
:{'classification.type': 'malware-distribution'}
skip_header
:True
WebInspektor¶
Unsafe sites¶
Public: yes
Revision: 2018-03-09
Description: Latest detected unsafe sites.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://app.webinspector.com/public/recent_detections/
name
:Unsafe sites
provider
:WebInspektor
rate_limit
:60
Parser
Module: intelmq.bots.parsers.webinspektor.parser
Configuration Parameters:
ZoneH¶
Defacements¶
Public: no
Revision: 2018-01-20
Documentation: https://zone-h.org/
Description: all the information contained in Zone-H’s cybercrime archive were either collected online from public sources or directly notified anonymously to us.
Collector
Module: intelmq.bots.collectors.mail.collector_mail_attach
- Configuration Parameters:
attach_regex
:csv
extract_files
:False
folder
:INBOX
mail_host
:__HOST__
mail_password
:__PASSWORD__
mail_ssl
:True
mail_user
:__USERNAME__
name
:Defacements
provider
:ZoneH
rate_limit
:3600
sent_from
:datazh@zone-h.org
subject_regex
:Report
Parser
Module: intelmq.bots.parsers.zoneh.parser
Configuration Parameters:
cAPTure¶
AS Details¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.dshield.org/reports.html
Description: No description provided by feed provider.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://dshield.org/asdetailsascii.html?as={{ AS Number }}
name
:AS Details
provider
:cAPTure
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.dshield.parser_asn
Configuration Parameters:
Block¶
Public: yes
Revision: 2018-01-20
Documentation: https://www.dshield.org/reports.html
Description: This list summarizes the top 20 attacking class C (/24) subnets over the last three days. The number of ‘attacks’ indicates the number of targets reporting scans from this subnet.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:https://www.dshield.org/block.txt
name
:Block
provider
:cAPTure
rate_limit
:86400
Parser
Module: intelmq.bots.parsers.dshield.parser_block
Configuration Parameters:
Ponmocup Domains CIF Format¶
Public: yes
Revision: 2018-01-20
Documentation: http://security-research.dyndns.org/pub/malware-feeds/
Description: List of ponmocup malware redirection domains and infected web-servers from cAPTure. See also http://security-research.dyndns.org/pub/botnet-links.htm and http://c-apt-ure.blogspot.com/search/label/ponmocup The data in the CIF format is not equal to the Shadowserver CSV format. Reasons are unknown.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:http://security-research.dyndns.org/pub/malware-feeds/ponmocup-infected-domains-CIF-latest.txt
name
:Infected Domains
provider
:cAPTure
rate_limit
:10800
Parser
Module: intelmq.bots.parsers.dyn.parser
Configuration Parameters:
Ponmocup Domains Shadowserver Format¶
Public: yes
Revision: 2020-07-08
Documentation: http://security-research.dyndns.org/pub/malware-feeds/
Description: List of ponmocup malware redirection domains and infected web-servers from cAPTure. See also http://security-research.dyndns.org/pub/botnet-links.htm and http://c-apt-ure.blogspot.com/search/label/ponmocup The data in the Shadowserver CSV is not equal to the CIF format format. Reasons are unknown.
Collector
Module: intelmq.bots.collectors.http.collector_http
- Configuration Parameters:
http_url
:http://security-research.dyndns.org/pub/malware-feeds/ponmocup-infected-domains-shadowserver.csv
name
:Infected Domains
provider
:cAPTure
rate_limit
:10800
Parser
Module: intelmq.bots.parsers.generic.parser_csv
- Configuration Parameters:
columns
:["time.source", "source.ip", "source.fqdn", "source.urlpath", "source.port", "protocol.application", "extra.tag", "extra.redirect_target", "extra.category"]
compose_fields
:{'source.url': 'http://{0}{1}'}
defaults_fields
:{'classification.type': 'malware-distribution'}
delimiter
:,
skip_header
:True
IntelMQ API¶
intelmq-api is a hug based API for the IntelMQ project.
Contents
Installing and running intelmq-api¶
intelmq-api requires the IntelMQ package to be installed on the system (it uses intelmqctl
to control the botnet).
You can install the intelmq-api
package using your preferred system package installation mechanism or using the pip
Python package installer.
We provide packages for the intelmq-api for the same operating systems as we do for the intelmq package itself.
For the list of supported distributions, please see the intelmq Installation page.
Our repository page gives installation instructions for various operating systems. No additional set-up steps are needed if you use these packages.
The intelmq-api provides the route /api
for managing the IntelMQ installation.
For development purposes and testing you can also run intelmq-api directly using hug
:
hug -m intelmq_api.serve
Installation using pip¶
The intelmq-api packages ship a configuration file in ${PREFIX}/etc/intelmq/api-config.json
, a positions configuration for the manager in {PREFIX}/etc/intelmq/manager/positions.conf
, a virtualhost configuration file for Apache 2 in ${PREFIX}/etc/intelmq/api-apache.conf
and a sudoers configuration file in ${PREFIX}/etc/intelmq/api-sudoers.conf
.
The value of ${PREFIX}
depends on your environment and is something like /usr/local/lib/pythonX.Y/dist-packages/
(where X.Y
is your Python version).
- The file
${PREFIX}/etc/intelmq/api-apache.conf
needs to be placed in the correct place for your Apache 2 installation. On Debian and Ubuntu, move the file to
/etc/apache2/conf-available.d/api-apache.conf
and then executea2enconf api-apache
.On CentOS, RHEL and Fedora, move the file to
/etc/httpd/conf.d/
.On openSUSE, move the file to
/etc/apache2/conf.d/
.
Don’t forget to reload your webserver afterwards.
The file
${PREFIX}/etc/intelmq/api-config.json
needs to be moved to/etc/intelmq/api-config.json
.The file
${PREFIX}/etc/intelmq/manager/positions.conf
needs to be moved to/etc/intelmq/manager/positions.conf
.Last but not least move the file
${PREFIX}/etc/intelmq/api-sudoers.conf
to/etc/sudoers.d/01_intelmq-api
and adapt the webserver user name in this file. Set the file permissions to0o440
.
Afterwards continue with the section Permissions below.
IntelMQ 2.3.1 comes with a tool intelmqsetup
which performs these set-up steps automatically.
Please note that the tool is very new and may not detect all situations correctly. Please report us any bugs you are observing.
The tools is idempotent, you can execute it multiple times.
Configuring intelmq-api¶
Depending on your setup you might have to install sudo
to make it possible for the intelmq-api
to run the intelmq
command as the user-account usually used to run intelmq
(which is also often called intelmq
).
intelmq-api
is configured using a configuration file in json
format.
intelmq-api
tries to load the configuration file from /etc/intelmq/api-config.json
and ${PREFIX}/etc/intelmq/api-config.json
, but you can override the path setting the environment variable INTELMQ_API_CONFIG
.
(When using Apache, you can do this by modifying the Apache configuration file shipped with intelmq-api
, the file contains an example)
When running the API using hug
, you can set the environment variable like this:
INTELMQ_API_CONFIG=/etc/intelmq/api-config.json hug -m intelmq_api.serve
The default configuration which is shipped with the packages is also listed here for reference:
{
"intelmq_ctl_cmd": ["sudo", "-u", "intelmq", "intelmqctl"],
"allowed_path": "/opt/intelmq/var/lib/bots/",
"session_store": "/etc/intelmq/api-session.sqlite",
"session_duration": 86400,
"allow_origins": ["*"]
}
On Debian based systems, the default path for the session_store
is /var/lib/dbconfig-common/sqlite3/intelmq-api/intelmqapi
, because the Debian package uses the Debian packaging tools to manage the database file.
The following configuration options are available:
intelmq_ctl_cmd
: Yourintelmqctl
command. If this is not set in a configuration file the default is used, which is["sudo", "-u", "intelmq", "/usr/local/bin/intelmqctl"]
The option"intelmq_ctl_cmd"
is a list of strings so that we can avoid shell-injection vulnerabilities because no shell is involved when running the command. This means that if the command you want to use needs parameters, they have to be separate strings.allowed_path
: intelmq-api can grant read-only access to specific files - this setting defines the path those files can reside in.session_store
: this is an optional path to a sqlite database, which is used for session storage and authentication. If it is not set (which is the default), no authentication is used!session_duration
: the maximal duration of a session, its 86400 seconds by defaultallow_origins
: a list of origins the responses of the API can be shared with. Allows every origin by default.
Permissions¶
intelmq-api
tries to write a couple of configuration files in the ${PREFIX}/etc/intelmq
directory - this is only possible if you set the permissions accordingly, given that intelmq-api
runs under a different user.
The user the API run as also needs write access to the folder the session_store
is located in, otherwise there will be an error accessing the session data.
If you’re using the default Apache 2 setup, you might want to set the group of the files to www-data
and give it write permissions (chmod -R g+w <directoryname>
).
In addition to that, the intelmq-manager
tries to store the bot positions via the API into the file ${PREFIX}/etc/intelmq/manager/positions.conf
.
You should therefore create the folder ${PREFIX}/etc/intelmq/manager
and the file positions.conf
in it.
Adding a user¶
If you enable the session_store
you will have to create user accounts to be able to access the API functionality. You can do this using intelmq-api-adduser
:
intelmq-api-adduser --user <username> --password <password>
A note on SELinux¶
On systems with SELinux enabled, the API will fail to call intelmqctl. Therefore, SELinux needs to be disabled:
setenforce 0
We welcome contributions to provide SELinux policies.
Usage from programs¶
The IntelMQ API can also be used from programs, not just browsers. To do so, first send a POST-Request with JSON-formatted data to http://localhost/intelmq/v1/api/login/
{
"username": "$your_username",
"password: "$your_password"
}
With valid credentials, the JSON-formatted response contains the login_token
.
This token can be used like an API key in the Authorization header for the next API calls:
Authorization: $login_token
Here is a full example using curl:
> curl --location --request POST "http://localhost/intelmq/v1/api/login/"\
--header "Content-Type: application/x-www-form-urlencoded"\
--data-urlencode "username=$username"\
--data-urlencode "password=$password"
{"login_token":"68b329da9893e34099c7d8ad5cb9c940","username":"$username"}
> curl --location "http://localhost/intelmq/v1/api/version"\
--header "Authorization: 68b329da9893e34099c7d8ad5cb9c940"
{"intelmq":"3.0.0rc1","intelmq-manager":"2.3.1"}
The same approach also works for Ansible, as you can see here:
Frequent operational problems¶
IntelMQCtlError¶
If the command is not configured correctly, you’ll see exceptions on startup like this:
intelmq_manager.runctl.IntelMQCtlError: <ERROR_MESSAGE>
This means the intelmqctl command could not be executed as a subprocess.
The <ERROR_MESSAGE>
should indicate why.
Access Denied / Authentication Required “Please provide valid Token verification credentials”¶
If you see the IntelMQ Manager interface and menu, but the API calls to the back-end querying configuration and status of IntelMQ fail with “Access Denied” or “Authentication Required: Please provide valid Token verification credentials” errors, you are maybe not logged in while the API requires authentication.
By default, the API requires authentication. Create user accounts and login with them or - if you have other protection means in place - deactivate the authentication requirement by removing or renaming the session_store parameter in the configuration.
Internal Server Error¶
There can be various reasons for internal server errors. You need to look at the error log of your web server, for example /var/log/apache2/error.log
or /var/log/httpd/error_log
for Apache 2. It could be that the sudo-setup is not functional, the configuration file or session database file can not be read or written or other errors in regards to the execution of the API program.
Can I just install it from the deb/rpm packages while installing IntelMQ from a different source?¶
Yes, you can install the API and the Manager from the deb/rpm repositories, and install your IntelMQ from a somewhere else, e.g. a local repository. However, knowledge about Python and system administration experience is recommended if you do so.
The packages install IntelMQ to /usr/lib/python3*/site-packages/intelmq/
.
Installing with pip
results in /usr/local/lib/python3*/site-packages/intelmq/
(and some other accompaning resources) which overrides the installation in /usr/lib/
.
You probably need to adapt the configuration parameter intelmq_ctl_cmd
to the /usr/local/bin/intelmqctl
executable and some other tweaks.
sqlite3.OperationalError: attempt to write a readonly database¶
SQLite does not only need write access to the database itself, but also the folder the database file is located in. Please check that the webserver has write permissions to the folder the session file is located in.
Getting help¶
You can use the IntelMQ users mailing lists and GitHub issues for getting help and getting in touch with other users and developers. See also the Introduction page.
IntelMQ Manager¶
IntelMQ Manager is a graphical interface to manage configurations for IntelMQ. Its goal is to provide an intuitive tool to allow non-programmers to specify the data flow in IntelMQ.
Contents
Installation¶
To use the intelmq-manager webinterface, a working intelmq installation which provides access to the IntelMQ API is required. Please refer to the IntelMQ Installation page.
intelmq-manager can be installed with different methods. Use the same one as you did for IntelMQ itself and the IntelMQ API.
Native Packages¶
As the repositories are already set-up on your system, you can simply install the package intelmq-manager
.
Our repository page gives installation instructions for various operating systems. No additional set-up steps are needed.
The webserver configuration (which is also shown below) for Apache will be automatically installed and the HTML files are stored under /usr/share/intelmq-manager/html
.
The webinterface is then available at http://localhost/intelmq-manager
.
Docker¶
The IntelMQ Manager is included in our Docker-images. See the section Docker in our installation guide.
Installation using pip¶
For installation via pip, the situation is more complex. The intelmq-manager package does not contain ready-to-use files, they need to be built locally. First, lets install the Manager itself:
pip3 install intelmq-manager
If your system uses wheel-packages, not the source distribution, you can use the intelmqsetup
tool.
intelmqsetup
which performs these set-up steps automatically but it may not detect all situations correctly.
If it finds intelmq-manager installed, calls its build routine is called.
The files are placed in /usr/share/intelmq_manager/html
, where the default Apache configuration expect it.
If your system used the dist-package or if you are using a local source, the tool may not do all required steps.
To call the build routine manually, use intelmq-manager-build --output-dir your/preferred/output/directory/
.
intelmq-manager ships with a default configuration for the Apache webserver (manager-apache.conf
):
Alias /intelmq-manager /usr/share/intelmq_manager/html/
<Directory /usr/share/intelmq_manager/html>
<IfModule mod_headers.c>
Header set Content-Security-Policy "script-src 'self'"
Header set X-Content-Security-Policy "script-src 'self'"
</IfModule>
</Directory>
This file needs to be placed in the correct place for your Apache 2 installation.
On Debian and Ubuntu, the file needs to be placed at
/etc/apache2/conf-available.d/manager-apache.conf
and then executea2enconf manager-apache
.On CentOS, RHEL and Fedora, the file needs to be placed at
/etc/httpd/conf.d/
and reload the webserver.On openSUSE, the file needs to be placed at
/etc/apache2/conf.d/
and reload the webserver.
Security considerations¶
Never ever run intelmq-manager on a public webserver without SSL and proper authentication!
The way the current version is written, anyone can send a POST request and change intelmq’s configuration files via sending HTTP POST requests. Intelmq-manager will reject non JSON data but nevertheless, we don’t want anyone to be able to reconfigure an intelmq installation.
Therefore you will need authentication and SSL. Authentication can be handled by the IntelMQ API. Please refer to its documentation on how to enable authentication and setup accounts.
Never ever allow unencrypted, unauthenticated access to intelmq-manager!
Configuration¶
In the file /usr/share/intelmq-manager/html/js/vars.js
set ROOT
to the URL of your intelmq-api
installation- by default that’s on the same host as intelmq-manager
.
CSP Headers¶
It is recommended to set these two headers for all requests:
Content-Security-Policy: script-src 'self'
X-Content-Security-Policy: script-src 'self'
Screenshots¶
Pipeline¶
This interface lets you visually configure the whole IntelMQ pipeline and the parameters of every single bot. You will be able to see the pipeline in a graph-like visualisation similar to the following screenshot (click to enlarge):

Bots Configuration¶
When you add a node or edit one you’ll be presented with a form with the available parameters for a bot. There you can easily change the parameters as shown in the screenshot:

After editing the bots’ configuration and pipeline, simply click “Save Configuration” to automatically write the changes to the correct files. The configurations are now ready to be deployed.
Note well: if you do not press “Save Configuration” your changes will be lost whenever you reload the web page or move between different tabs within the IntelMQ manager page.
Botnet Management¶
When you save a configuration you can go to the ‘Management’ section to see what bots are running and start/stop the entire botnet, or a single bot.

Botnet Monitoring¶
You can also monitor the logs of individual bots or see the status of the queues for the entire system or for single bots.
In this next example we can see the number of queued messages for all the queues in the system.

The following example we can see the status information of a single bot. Namely, the number of queued messages in the queues that are related to that bot and also the last 20 log lines of that single bot.

Usage¶
Keyboard Shortcuts¶
Any underscored letter denotes access key shortcut. The needed shortcut-keyboard is different per Browser:
Firefox: <kbd>Alt + Shift + letter</kbd>
Chrome & Chromium: <kbd>Alt + letter</kbd>
Configuration Paths¶
The IntelMQ Manager queries the configuration file paths and directory names from intelmqctl
and therefore any global environment variables (if set) are effective in the Manager too.
The interface for this query is intelmqctl debug --get-paths
, the result is also shown in the /about.html
page of your IntelMQ Manager installation.
For more information on the ability to adapt paths, have a look at the Configuration section.
Configuration page¶
Named queues / paths¶
With IntelMQ Manager you can set the name of certain paths by double-clicking on the line which connects two bots:

The name is then displayed along the edge:

Frequently asked questions¶
Contents
For questions about the API, have a look at the API documentation page
Send IntelMQ events to Splunk¶
Go to Splunk and configure in order to be able to receive logs(intelmq events) to a TCP port
Use TCP output bot and configure accordingly to the Splunk configuration that you applied.
Permission denied when using Redis Unix socket¶
If you get an error like this:
intelmq.lib.exceptions.PipelineError: pipeline failed - ConnectionError('Error 13 connecting to unix socket: /var/run/redis/redis.sock. Permission denied.',)
Make sure the intelmq user as sufficient permissions for the socket.
In /etc/redis/redis.conf
(or wherever your configuration is), check the permissions and set it for example to group-writeable:
unixsocketperm 770
And add the user intelmq to the redis-group:
usermod -aG redis intelmq
Why is the time invalid?¶
If you wonder why you are getting errors like this:
intelmq.lib.exceptions.InvalidValue: invalid value '2017-03-06T07:36:29' () for key 'time.source'
IntelMQ requires time zone information for all timestamps. Without a time zone, the time is ambiguous and therefore rejected.
How can I improve the speed?¶
In most cases the bottlenecks are look-up experts. In these cases you can easily use the integrated load balancing features.
Multithreading¶
When using the AMQP broker, you can make use of Multi-threading. See the Multithreading (Beta) section.
“Classic” load-balancing (Multiprocessing)¶
Before Multithreading was available in IntelMQ, and in case you use Redis as broker, the only way to do load balancing involves more work.
Create multiple instances of the same bot and connect them all to the same source and destination bots. Then set the parameter load_balance
to true
for the bot which sends the messages to the duplicated bot. Then, the bot sends messages to only one of the destination queues and not to all of them.
True Multi*processing* is not available in IntelMQ. See also this discussion on a possible enhanced load balancing.
Other options¶
For any bottleneck based on (online) lookups, optimize the lookup itself and if possible use local databases.
It is also possible to use multiple servers to spread the workload. To get the messages from one system to the other you can either directly connect to the other’s pipeline or use a fast exchange mechanism such as the TCP Collector/Output (make sure to secure the network by other means).
Removing raw data for higher performance and less space usage¶
If you do not need the raw data, you can safely remove it. For events (after parsers), it keeps the original data, eg. a line of a CSV file. In reports it keeps the actual data to be parsed, so don’t delete the raw field in Reports - between collectors and parsers.
The raw data consumes about 50% - 30% of the messages’ size. The size of course depends on how many additional data you add to it and how much data the report includes. Dropping it, will improve the speed as less data needs to be transferred and processed at each step.
In a bot
You can do this for example by using the Field Reducer Expert. The configuration could be:
type
:blacklist
keys
:raw
Other solutions are the Modify bot and the Sieve bot. The last one is a good choice if you already use it and you only need to add the command:
remove raw
In the database
In case you store data in the database and you want to keep its size small, you can (periodically) delete the raw data there.
To remove the raw data for a events table of a PostgreSQL database, you can use something like:
UPDATE events SET raw = NULL WHERE "time.source" < '2018-07-01';
If the database is big, make sure only update small parts of the database by using an appropriate WHERE
clause. If you do not see any negative performance impact, you can increase the size of the chunks, otherwise the events in the output bot may queue up. The id
column can also be used instead of the source’s time.
Another way of reducing the raw
-data from the database is described in the EventDB documentation: Separating raw values in PostgreSQL using view and trigger
My bot(s) died on startup with no errors logged¶
Rather than starting your bot(s) with intelmqctl start
, try intelmqctl run [bot]
. This will provide valuable debug output you might not otherwise see, pointing to issues like system configuration errors.
Orphaned Queues¶
This section has been moved to the section Orphaned Queues.
Multithreading is not available for this bot¶
Multithreading is not available for some bots and AMQP broker is necessary. Possible reasons why a certain bot or a setup does not support Multithreading include:
Multithreading is only available when using the AMQP broker.
For most collectors, Multithreading is disabled. Otherwise this would lead to duplicated data, as the data retrieval is not atomic.
Some bots use libraries which are not thread safe. Look a the bot’s documentation for more information.
Some bots’ operations are not thread safe. Look a the bot’s documentation for more information.
If you think this mapping is wrong, please report a bug.
Docker: Security Headers¶
If you run our docker image in production, we recommend you to set security headers.
You can do this by creating a new file called example_config/nginx/security.conf
in the cloned intelmq-docker
repository.
Write the following inside the configuration file, and change the http(s)://<your-domain>
to your domain name.
server_tokens off; # turn off server_token, instead of nginx/13.2 now it will only show nginx
add_header X-Frame-Options SAMEORIGIN; # https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Frame-Options
add_header X-Content-Type-Options nosniff; # https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-Content-Type-Options
add_header X-XSS-Protection "1; mode=block"; # https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/X-XSS-Protection
add_header Content-Security-Policy "script-src 'self' 'unsafe-inline' http(s)://<your-domain>; frame-src 'self' http(s)://<your-domain>; object-src 'self' http(s)://<your-domain>"; # https://developer.mozilla.org/en-US/docs/Web/HTTP/CSP
After you created the file, edit the docker-compose.yml
and mount it to the nginx
with
volumes:
- ./example_config/nginx/security.conf:/etc/nginx/conf.d/security.conf
IMPORTANT Mount the exact name & not the directory, because otherwise you would overwrite the whole directory and the other files would be gone inside the container.
Connecting with other systems¶
IntelMQ Universe¶
Contents
IntelMQ is more than a the core library itself and many programs are developed around in the IntelMQ universe. This document provides an overview of the ecosystem and all related tools. If you think something is missing, please let us know!
Unless otherwise stated, the products are maintained by the IntelMQ community.
IntelMQ Core¶
This is IntelMQ itself, as it is available on GitHub.
The Core includes all the components required for processing data feeds. This includes the bots, configuration, pipeline, the internal data format, management tools etc.
IntelMQ Manager¶
The Manager is the most known software and can be seen as the face of IntelMQ. This software provides a graphical user interface to the management tool intelmqctl.

IntelMQ Webinput CSV¶
A web-based interface to ingest CSV data into IntelMQ with on-line validation and live feedback.
This interface allows inserting “one-shot” data feeds into IntelMQ without the need to configure bots in IntelMQ.
Developed and maintained by CERT.at.
→ Repository: intelmq-webinput-csv

IntelMQ Mailgen¶
A solution allowing an IntelMQ setup with a complex contact database, managed by a web interface and sending out aggregated email reports. In different words: To send grouped notifications to network owners using SMTP.
Developed and maintained by Intevation, initially funded by BSI.
It consists of these three components, which can also be used on their own.
IntelMQ CertBUND Contact¶
The certbund-contact consists of two IntelMQ expert bots, which fetch and process the information from the contact database, and scripts to import RIPE data into the contact database. Based on user-defined rules, the experts determine to which contact the event is to be sent to, and which e-mail template and attachment format to use.
IntelMQ Fody¶
Fody is a web based interface for Mailgen. It allows to read and edit contacts, query sent mails (tickets) and call up data from the EventDB.
It can also be used to just query the EventDB without using Mailgen.

intelmq-mailgen¶
Sends emails with grouped event data to the contacts determined by the certbund-contact. Mails can be encrypted with PGP.
“Constituency Portal” tuency¶
A web application helping CERTs to enable members of their constituency to self-administrate how they get warnings related to their network objects (IP addresses, IP ranges, autonomous systems, domains). tuency is developed by Intevation for CERT.at.
If features organizational hierarchies, contact roles, self-administration and network objects per organization (Autonomous systems, network ranges, (sub-)domains, RIPE organization handles). A network object claiming and approval process prevents abuse. An hierarchical rule-system on the network objects allow fine-grained settings. The tagging system for contacts and organization complement the contact-management features of the portal. Authentication is based on keycloak, which enables the re-use of the user accounts in the portal. The integrated API enables IntelMQ to query the portal for the right abuse contact and notification settings with the Tuency expert.

“Constituency Portal” do-portal (not developed any further)¶
Note: The do-portal is deprecated and succeeded by tuency.
A contact portal with organizational hierarchies, role functionality and network objects based on RIPE, allows self-administration by the contacts. Can be queried from IntelMQ and integrates the stats-portal.
Stats Portal¶
A Grafana-based statistics portal for the EventDB. Can be integrated into do-portal. It uses aggregated data to serve statistical data quickly.

Malware Name Mapping¶
A mapping for malware names of different feeds with different names to a common family name.
IntelMQ-Docker¶
A repository with tools for IntelMQ docker instance.
Developed and maintained by CERT.at.
ELK Stack¶
If you wish to run IntelMQ with ELK (Elasticsearch, Logstash, Kibana) it is entirely possible. This guide assumes the reader is familiar with basic configuration of ELK and does not aim to cover using ELK in general. It is based on the version 6.8.0 (ELK is a fast moving train therefore things might change). Assuming you have IntelMQ (and Redis) installation in place, lets dive in.
Configuring IntelMQ for Logstash¶
In order to pass IntelMQ events to Logstash we will utilize already installed Redis. Add a new Redis Output Bot to your pipeline. As the minimum fill in the following parameters: bot-id, redis_server_ip (can be hostname), redis_server_port, redis_password (if required, else set to empty!), redis_queue (name for the queue). It is recommended to use a different redis_db parameter than used by the IntelMQ (specified as source_pipeline_db, destination_pipeline_db and statistics_database).
Example values:
bot-id: logstash-output
redis_server_ip: 10.10.10.10
redis_server_port: 6379
redis_db: 4
redis_queue: logstash-queue
Notes
Unfortunately you will not be able to monitor this redis queue via IntelMQ Manager.
Configuring Logstash¶
Logstash defines pipeline as well. In the pipeline configuration of Logstash you need to specify where it should look for IntelMQ events, what to do with them and where to pass them.
Input¶
This part describes how to receive data from Redis queue. See the example configuration and comments below:
input {
redis {
host => "10.10.10.10"
port => 6379
db => 4
data_type => "list"
key => "logstash-queue"
}
}
host - same as redis_server_ip from the Redis Output Bot
port - the redis_server_port from the Redis Output Bot
db - the redis_db parameter from the Redis Output Bot
data_type - set to list
key - same as redis_queue from the Redis Output Bot
Notes
You can also use syntax like this: host => “${REDIS_HOST:10.10.10.10}”The value will be taken from environment variable $REDIS_HOST. If the environment variable is not defined then the default value of 10.10.10.10 will be used instead.
Filter (optional)¶
Before passing the data to the database you can apply certain changes. This is done with filters. See an example:
filter {
mutate {
lowercase => ["source.geolocation.city", "classification.identifier"]
remove_field => ["__type", "@version"]
}
date {
match => ["time.observation", "ISO8601"]
}
}
Notes
It is not recommended to apply any modifications to the data (within the mutate key) outside of the IntelMQ. All necessary modifications should be done only by appropriate IntelMQ bots. This example only demonstrates the possibility.
It is recommended to use the date filter: generally we have two timestamp fields - time.source (provided by the feed source this can be understood as when the event happened; however it is not always present) and time.observation (when IntelMQ collected this event). Logstash also adds another field @timestamp with time of processing by Logstash. While it can be useful for debugging, I recommend to set the @timestamp to the same value as time.observation.
Output¶
The pipeline also needs output, where we define our database (Elasticsearch). The simplest way of doing so is defining an output like this:
output {
elasticsearch {
hosts => ["http://10.10.10.11:9200", "http://10.10.10.12:9200"]
index => "intelmq-%{+YYYY.MM}"
}
}
hosts - Elasticsearch host (or more) with the correct port (9200 by default)
index - name of the index where to insert data
Notes
Authors experience, hardware equipment and the amount of events collected led to having a separate index for each month. This might not necessarily suit your needs, but is a suggested option.
By default the ELK stack uses insecure HTTP. It is possible to setup Security for secure connections and basic user management. This is possible with the Basic (free) licence since versions 6.8.0 and 7.1.0.
Configuring Elasticsearch¶
Configuring Elasticsearch is entirely up to you and should be consulted with the official documentation. What you will most likely need is something called index template mappings. IntelMQ provides a tool for generating such mappings. See ElasticMapper Tool.
Notes
Default installation of Elasticsearch database allows anyone with cURL and connection capability administrative access to the database. Make sure you secure your toys!
MISP integrations in IntelMQ¶
While MISP and IntelMQ seem to solve similar problems in the first hindsight, their intentions and strengths differ significantly.
In a nutshell, MISP stores manually curated indicators (called attributes) grouped in events. An event can have an arbitrary number of attributes. MISP correlates these indicators with each other and can synchronize the data between multiple MISP instances.
On the other side, IntelMQ in it’s essence (not considering the EventDB) has no state or database, but is stream-oriented. IntelMQ acts as a toolbox which can be configured as needed to automate processes of mass data with little or no human interaction At the end of the processing the data may land in some database or be sent to other systems.
Both systems do not intend to replace each other or do compete. They integrate seamless and combine each other enabling more use-cases and
MISP API Collector¶
The MISP API Collector fetches data from MISP via the MISP API.
Look at the Bots’ documentation for more information.
MISP Expert¶
The MISP Expert searches MISP by using the MISP API
for attributes/events matching the source.ip
of the event.
The MISP Attribute UUID and MISP Event ID of the newest attribute are added to the event.
Look at the Bots’ documentation for more information.
MISP Feed Output¶
This bot creates a complete MISP feed ready to be configured in MISP as incoming data source.
Look at the Bots’ documentation for more information.
MISP API Output¶
Can be used to directly create MISP events in a MISP instance by using the MISP API.
Look at the Bots’ documentation for more information.
IntelMQ - n6 Integration¶
n6 is an Open Source Tool with very similar aims as IntelMQ: processing and distributing IoC data. The use-cases, architecture and features differ and both tools have non-overlapping strengths. n6 is maintained and developed by CERT.pl.
Information about n6 can be found here:
Website: n6.cert.pl
Source Code: github.com/CERT-Polska/n6
n6 documentation: n6.readthedocs.io
n6sdk developer documentation: n6sdk.readthedocs.io


Data format¶
The internal data representation differs between IntelMQ and n6, so any data exchange between the systems requires a format conversion. For example, in n6 one message can contain multiple IP addresses, but IntelMQ is intentionally restricted to one IP address per message. Therefore, one n6 event results in one or more IntelMQ events. Because of this, and some other naming differences and ambiguities, the format conversion is not bidirectional.
Data exchange interface¶
n6 offers a STOMP interface via the RabbitMQ broker, which can be used for both sending and receiving data. IntelMQ offers both a STOMP collector bot for receiving data from n6, as well as a STOMP output bot for sending data to n6 instances.
Data conversion¶
IntelMQ can parse n6 data using the n6 parser and n6 can parse IntelMQ data using the Intelmq2n6 parser.
Complete example¶
Data flow n6 to IntelMQ¶

Data flow IntelMQ to n6¶

CERT.pl Data feed¶
CERT.pl offers data feed available to their partners through the STOMP interface. Our feeds documentation contains details how it can be enabled in IntelMQ: CERT.pl n6 STOMP stream
Webinput CSV¶
The IntelMQ Webinput CSV software can also be used together with n6. The documentation on this component can be found in the software’s repository: https://github.com/certat/intelmq-webinput-csv/blob/master/docs/webinput-n6.md
EventDB¶
The EventDB is not a software itself.
The EventDB is a database (usually PostgreSQL) that gets filled with with data from IntelMQ using the SQL Output Bot.
The events table itself¶
IntelMQ comes with the intelmq_psql_initdb
command line tool. It creates an SQL file containing:
A
CREATE TABLE events
statement with all valid IntelMQ fields as columns and correct typesSeveral indexes as examples for a good read & search performance
All elements of this SQL file can be adapted and extended before running the SQL file against a database, especially the indexes.
Having an events table as outlined in the SQL file, IntelMQ’s SQL Output Bot can write all received events into this database table.
This events table is the core of the so-called EventDB and also required by all other sections of this document.
EventDB Utilities¶
Some scripts related to the EventDB are located in the contrib/eventdb folder in the IntelMQ git repository.
Apply Malware Name Mapping¶
The apply_mapping_eventdb.py script applies the malware name mapping to the EventDB. Source and destination columns can be given, also a local file. If no local file is present, the mapping can be downloaded on demand. It queries the database for all distinct malware names with the taxonomy “malicious-code” and sets another column to the malware family name.
Apply Domain Suffix¶
The apply_domain_suffix.py script writes the public domain suffix to the source.domain_suffix / destination.domain_suffix columns, extracted from source.fqdn / destination.fqdn.
Usage¶
The Python scripts can connect to a PostgreSQL server with an eventdb database and an events table. The command line arguments interface for both scripts are the same. See –help for more information:
apply_mapping_eventdb.py -h
apply_domain_suffix.py -h
PostgreSQL trigger¶
PostgreSQL trigger is a trigger keeping track of the oldest inserted/updated “time.source” data. This can be useful to (re-)generate statistics or aggregation data.
The SQL script can be executed in the database directly.
EventDB Statistics¶
The EventDB provides a great base for statistical analysis of the data.
The eventdb-stats repository contains a Python script that generates an HTML file and includes the Plotly JavaScript Open Source Graphing Library. By modifying the configuration file it is possible to configure various queries that are then displayed using graphs:

Using EventDB with Timescale DB¶
Timescale DB is a PostgreSQL extension to add time-series support, which is quite handy as you dont have to learn other syntaxes as you already know. You can use the SQL Queries as before, the extension will handle the rest. To see all limitations, please check the Timescale DB Documentation.
What is time-series?¶
Time-series has been invented as traditional database design like relational or nosql are not made for time-based data. A big benefit of time-series instead of other database designs over a time-based search pattern is the performance. As IntelMQ uses data based upon time, this design is awesome & will give you a performance boost.
How to setup¶
Thanks to TimescaleDB its very easy to setup.
1. Choose your preferred Timescale DB environment & follow the installation instructions.
2. Now lets create a hypertable, which is the timescale DB time-series structure. SELECT create_hypertable('', 'time.source');
.
3. Now our hypertable is setup & timescaleDB takes care of the rest. You can perform queries as usual, for further information please check Timescale DB Documentation.
How to upgrade from my existing database?¶
To update your existing database to use this awesome time-series feature, just follow the How to setup
instruction.
You can perform the hypertable
command even on already existing databases. BUT there are some limitations from timescaleDB.
Separating raw values in PostgreSQL using view and trigger¶
In order to reduce the row size in the events table, the raw column’s data can be separated from the other columns. While the raw-data is about 30-50% of the data row’s size, it is not used in most database queries, as it serves only a backup functionality. Other possibilities to reduce or getting rid of this field are described in the FAQ, section Removing raw data for higher performance and less space usage.
The steps described here are best performed before the events table is filled with data, but can as well be done with existing data.
The approach requires four steps:
An existing events table, see the first section of this document.
Deleting or renaming the raw column of the events table.
Creating a table raws which holds only the raw field of the events and linking both tables using the event_id.
Creating the view v_events which joins the tables events and raws.
Creating the function process_v_events_insert and INSERT trigger tr_events.
The last steps brings us several advantages:
All INSERT statements can contain all data, including the raw field.
No code changes are needed in the IntelMQ output bot or your own scripts. A migration is seamless.
PostgreSQL itself ensures that the data of both tables is consistent and linked correctly.
The complete SQL script can be found in the contrib/eventdb directory of IntelMQ. It does not cover step 2 to avoid accidental data loss - you need to do this step manually.
Abuse-contact look-ups¶
The right decision whom to contact about a specific incident is vital to get the incident resolved as quick as possible. Different types of events may required different abuse-contact to be selected. For example, issues about a device, e.g. a vulnerability in the operating system or an application, is better sent to the hoster which can inform the server administrator. For website-related issues, like defacements or phishing, the domain owner (maintaining the content of the website) could be the better and more direct contact. Additionally, different CERT’s have different approaches and different contact databases. Multiple information sources have different information, and some sources are more accurate than others. IntelMQ can query multiple sources of abuse-contacts and combine them. Internal databases, like a Constituency Portal (see ecosystem) provide high-quality and first-hand contact information. The RIPE document Sources of Abuse Contact Information for Abuse Handlers contains a good summary of the complex of themes.
Sources for abuse-contacts¶
All these bots add the queried contacts to the IntelMQ events in the field source.abuse_contact if not state otherwise in the documentation.
Sources for domain-based abuse-contacts¶
These bots are suitable for domain-based abuse-contact look-ups.
RDAP expert queries private and public RDAP servers for source.fqdn and add the contact information to the event as source.abuse_contact.
Trusted Introducer Lookup Expert expert queries a locally cached Trusted Introducer team directory for the TLD or domain (first match) of source.fqdn.
Sources for IP address-based abuse-contacts¶
These bots are suitable for IP address- and ASN-based abuse-contact look-ups.
Abusix expert queries the online Abusix service.
DO Portal Expert Bot expert queries an instance of the do-portal software (deprecated).
Tuency expert queries an instance of the tuency Constituency Portal for the IP address. The Portal also takes into account any notification rules, which are saved additionally in the event.
RIPE expert queries the online RIPE database for IP-Address and AS contacts.
Trusted Introducer Lookup Expert expert queries a locally cached Trusted Introducer team directory for the Autonomous system source.asn.
Generic sources for abuse-contacts¶
Generic DB Lookup expert for local data sources, like database tables mapping ASNs to abuse-contact or Country Codes to abuse-contact.
uWhoisd expert for fetching whois-data, not extracting abuse-contact information
Helpful other bots for pre-processing¶
Cymru Whois to lookup ASN, Geolocation, and BGP prefix for
*.ip
.Domain Suffix to lookup the public suffix of the domain in
*.fqdn
.Gethostbyname resolve
*.ip
from*.fqdn
.MaxMind GeoIP to lookup Geolocation information for
*.ip
.Reverse DNS to resolve
*.reverse_dns
from*.ip
.RIPE to lookup
*.asn
and Geolocation information for*.ip
.Tor Nodes for filtering out TOR nodes.
Url2FQDN to extract
*.fqdn
/*.ip
from*.url
.
Combining the lookup approaches¶
In order to get the best contact, it may be necessary to combine multiple abuse-contact sources. IntelMQ’s modularity provides methods to arrange and configure the bots as needed. Among others, the following bots can help in getting the best result:
Filter expert: Your lookup process may be different for different types of data. E.g. website-related issues may be better addressed at the domain owner and device-related issues may be better addressed to the hoster.
Modify expert: Allows you to set values based on filter and also format values based on the value of other fields.
Sieve expert: Very powerful expert which allows filtering, routing (to different subsequent bots) based on if-expressions . It support set-operations (field value is in list) as well as sub-network operations for IP address networks in CIDR notation for the expression-part. You can as well set the abuse-contact directly.
Getting involved¶
Developers Guide¶
Contents
Intended Audience¶
This guide is for developers of IntelMQ. It explains the code architecture, coding guidelines as well as ways you can contribute code or documentation. If you have not done so, please read the Introduction first. Once you feel comfortable running IntelMQ with open source bots and you feel adventurous enough to contribute to the project, this guide is for you. It does not matter if you are an experienced Python programmer or just a beginner. There are a lot of samples to help you out.
However, before we go into the details, it is important to observe and internalize some overall project goals.
Goals¶
It is important, that all developers agree and stick to these meta-guidelines. IntelMQ tries to:
Be well tested. For developers this means, we expect you to write unit tests for bots. Every time.
Reduce the complexity of system administration
Reduce the complexity of writing new bots for new data feeds
Make your code easily and pleasantly readable
Reduce the probability of events lost in all process with persistence functionality (even system crash)
Strictly adhere to the existing Data Format for key-values in events
Always use JSON format for all messages internally
Help and support the interconnection between IntelMQ and existing tools like AbuseHelper, CIF, etc. or new tools (in other words: we will not accept data-silos!)
Provide an easy way to store data into Log Collectors like ElasticSearch, Splunk
Provide an easy way to create your own black-lists
Provide easy to understand interfaces with other systems via HTTP RESTFUL API
The main take away point from the list above is: things MUST stay __intuitive__ and __easy__. How do you ultimately test if things are still easy? Let them new programmers test-drive your features and if it is not understandable in 15 minutes, go back to the drawing board.
Similarly, if code does not get accepted upstream by the main developers, it is usually only because of the ease-of-use argument. Do not give up , go back to the drawing board, and re-submit again.
Development Environment¶
Installation¶
Developers can create a fork repository of IntelMQ in order to commit the new code to this repository and then be able to do pull requests to the main repository. Otherwise you can just use the ‘certtools’ as username below.
The following instructions will use pip3 -e, which gives you a so called editable installation. No code is copied in the libraries directories, there’s just a link to your code. However, configuration files still required to be moved to /opt/intelmq as the instructions show.
The traditional way to work with IntelMQ is to install it globally and have a separated user for running it. If you wish to separate your machine Python’s libraries, e.g. for development purposes, you could alternatively use a Python virtual environment and your local user to run IntelMQ. Please use your preferred way from instructions below.
Directories explained¶
For development purposes, you need two directories: one for a local repository copy, and the second as a root dictionary for the IntelMQ installation.
The default IntelMQ root directory is /opt/intelmq. This directory is used for configurations (/opt/intelmq/etc), local states (/opt/intelmq/var/lib) and logs (/opt/intelmq/var/log). If you want to change it, please set the INTELMQ_ROOT_DIR environment variable with a desired location.
For repository directory, you can use any path that is accessible by users you use to run IntelMQ. For globally installed IntelMQ, the directory has to be readable by other unprivileged users (e.g. home directories on Fedora can’t be read by other users by default).
To keep commands in the guide universal, we will use environmental variables for repository and installation paths. You can set them with following commands:
# Adjust paths if you want to use non-standard directories
export INTELMQ_REPO=/opt/dev_intelmq
export INTELMQ_ROOT_DIR=/opt/intelmq
Note
If using non-default installation directory, remember to keep the root directory variable set for every run of IntelMQ commands. If you don’t, then the default location /opt/intelmq will be used.
Using globally installed IntelMQ¶
sudo -s
git clone https://github.com/<your username>/intelmq.git $INTELMQ_REPO
cd $INTELMQ_REPO
pip3 install -e .
useradd -d $INTELMQ_ROOT_DIR -U -s /bin/bash intelmq
intelmqsetup
Using virtual environment¶
git clone https://github.com/<your username>/intelmq.git $INTELMQ_REPO
cd $INTELMQ_REPO
python -m venv .venv
source .venv/bin/activate
pip install -e .
# If you use a non-local directory as INTELMQ_ROOT_DIR, use following
# command to create it and change the ownership.
sudo install -g `whoami` -o `whoami` -d $INTELMQ_ROOT_DIR
# For local directory, just create it with mkdir:
mkdir $INTELMQ_ROOT_DIR
intelmqsetup --skip-ownership
Note
Please do not forget that configuration files, log files will be available on $INTELMQ_ROOT_DIR. However, if your development is somehow related to any shipped configuration file, you need to apply the changes in your repository $INTELMQ_REPO/intelmq/etc/.
Additional services¶
Some features require additional services, like message queue or database. The commonly used services are gained for development purposes in the Docker Compose file in contrib/development-tools/docker-compose-common-services.yaml in the repository. You can use them to run services on your machine in a docker containers, or decide to configure them in an another way. To run them using Docker Compose, use following command from the main repository directory:
# For older Docker versions, you may need to use `docker-compose` command
docker compose -f contrib/development-tools/docker-compose-common-services.yaml up -d
This will start in the background containers with Redis, RabbitMQ, PostgreSQL and MongoDB.
How to develop¶
After you successfully setup your IntelMQ development environment, you can perform any development on any .py file on $INTELMQ_REPO. After you change, you can use the normal procedure to run the bots:
su - intelmq # Use for global installation
source .venv/bin/activate # Use for virtual environment installation
intelmqctl start spamhaus-drop-collector
tail -f $INTELMQ_ROOT_DIR/var/log/spamhaus-drop-collector.log
You can also add new bots, creating the new .py file on the proper directory inside cd $INTELMQ_REPO/intelmq. However, your IntelMQ installation with pip3 needs to be updated. Please check the following section.
Update¶
In case you developed a new bot, you need to update your current development installation. In order to do that, please follow this procedure:
Make sure that you have your new bot in the right place.
Update pip metadata and new executables:
sudo -s # Use for global installation
source .venv/bin/activate # Use for virtual environment installation
cd /opt/dev_intelmq
pip3 install -e .
If you’re using the global installation, an additional step of changing permissions and ownership is necessary:
find $INTELMQ_ROOT_DIR/ -type d -exec chmod 0770 {} \+
find $INTELMQ_ROOT_DIR/ -type f -exec chmod 0660 {} \+
chown -R intelmq.intelmq $INTELMQ_ROOT_DIR
## if you use the intelmq manager (adapt the webservers' group if needed):
chown intelmq.www-data $INTELMQ_ROOT_DIR/etc/*.conf
Now you can test run your new bot following this procedure:
su - intelmq # Use for global installation
source .venv/bin/activate # Use for virtual environment installation
intelmqctl start <bot_id>
Testing¶
Libraries required for tests are listed in the setup.py file. You can install them with pip:
pip3 install -e .[development]
or the package management of your operating system.
All changes have to be tested and new contributions should be accompanied by according unit tests. Please do not run the tests as root just like any other IntelMQ component for security reasons. Any other unprivileged user is possible.
You can run the tests by changing to the directory with IntelMQ repository and running either unittest or pytest. For virtual environment installation, please activate it and omit the sudo -u from examples below:
cd $INTELMQ_REPO
sudo -u intelmq python3 -m unittest {discover|filename} # or
sudo -u intelmq pytest [filename]
sudo -u intelmq python3 setup.py test # uses a build environment (no external dependencies)
Some bots need local databases to succeed. If you only want to test one explicit test file, give the file path as argument.
There are multiple GitHub Action Workflows setup for automatic testing, which are triggered on pull requests. You can also easily activate them for your forks.
There are a bunch of environment variables which switch on/off some tests:
INTELMQ_TEST_DATABASES: databases such as postgres, elasticsearch, mongodb are not tested by default. Set this environment variable to 1 to test those bots. These tests need preparation, e.g. running databases with users and certain passwords etc. Have a look at the .github/workflows/unittests.yml and the corresponding .github/workflows/scripts/setup-full.sh in IntelMQ’s repository for steps to set databases up.
INTELMQ_SKIP_INTERNET: tests requiring internet connection will be skipped if this is set to 1.
INTELMQ_SKIP_REDIS: redis-related tests are ran by default, set this to 1 to skip those.
INTELMQ_TEST_EXOTIC: some bots and tests require libraries which may not be available, those are skipped by default. To run them, set this to 1.
INTELMQ_TEST_REDIS_PASSWORD: Set this value to the password for the local redis database if needed.
INTELMQ_LOOKYLOO_TEST: Set this value to run the lookyloo tests. Public lookyloo instance will be used as default.
For example, to run all tests you can use:
INTELMQ_TEST_DATABASES=1 INTELMQ_TEST_EXOTIC=1 pytest intelmq/tests/
The tests use the configuration files in your working directory, not those installed in /opt/intelmq/etc/ or /etc/. You can run the tests for a locally changed intelmq without affecting an installation or requiring root to run them.
Development Guidelines¶
Coding-Rules¶
Most important: KEEP IT SIMPLE!! This can not be over-estimated. Feature creep can destroy any good software project. But if new folks can not understand what you wrote in 10-15 minutes, it is not good. It’s not about the performance, etc. It’s about readability.
In general, we follow PEP 0008. We recommend reading it before committing code.
There are some exceptions: sometimes it does not make sense to check for every PEP8 error (such as whitespace indentation when you want to make a dict=() assignment look pretty. Therefore, we do have some exceptions defined in the setup.cfg file.
We support Python 3 only.
Each internal object in IntelMQ (Event, Report, etc) that has strings, their strings MUST be in UTF-8 Unicode format.
Any data received from external sources MUST be transformed into UTF-8 Unicode format before add it to IntelMQ objects.
Any component of the IntelMQ MUST be independent of the message queue technology (Redis, RabbitMQ, etc…).
Please add a license and copyright header to your bots. There is a Github action that tests for reuse compliance of your code files.
Layout Rules¶
intelmq/
lib/
bot.py
cache.py
message.py
pipeline.py
utils.py
bots/
collector/
<bot name>/
collector.py
parser/
<bot name>/
parser.py
expert/
<bot name>/
expert.py
output/
<bot name>/
output.py
/conf
runtime.yaml
Assuming you want to create a bot for a new ‘Abuse.ch’ feed. It turns out that here it is necessary to create different parsers for the respective kind of events (e.g. malicious URLs). Therefore, the usual hierarchy ‘intelmq/bots/parser/<FEED>/parser.py’ would not be suitable because it is necessary to have more parsers for each Abuse.ch Feed. The solution is to use the same hierarchy with an additional “description” in the file name, separated by underscore. Also see the section Directories and Files naming.
Example (including the current ones):
/intelmq/bots/parser/abusech/parser_domain.py
/intelmq/bots/parser/abusech/parser_ip.py
/intelmq/bots/parser/abusech/parser_ransomware.py
/intelmq/bots/parser/abusech/parser_malicious_url.py
Please document your added/modified code.
For doc strings, we are using the sphinx-napoleon-google-type-annotation.
Additionally, Python’s type hints/annotations are used, see PEP 484.
Configuration Files Path: /opt/intelmq/etc/
PID Files Path: /opt/intelmq/var/run/
Logs Files and dumps Path: /opt/intelmq/var/log/
Additional Bot Files Path, e.g. templates or databases: /opt/intelmq/var/lib/bots/[bot-name]/
Any directory and file of IntelMQ has to follow the Directories and Files naming. Any file name or folder name has to * be represented with lowercase and in case of the name has multiple words, the spaces between them must be removed or replaced by underscores; * be self-explaining what the content contains.
In the bot directories name, the name must correspond to the feed provider. If necessary and applicable the feed name can and should be used as postfix for the filename.
Examples:
intelmq/bots/parser/taichung/parser.py
intelmq/bots/parser/cymru/parser_full_bogons.py
intelmq/bots/parser/abusech/parser_ransomware.py
Class name of the bot (ex: PhishTank Parser) must correspond to the type of the bot (ex: Parser) e.g. PhishTankParserBot
IntelMQ Data Format Rules¶
Any component of IntelMQ MUST respect the IntelMQ Data Format.
Reference: IntelMQ Data Format - Data Format
Code Submission Rules¶
The main repository is in github.com/certtools/intelmq.
There are a couple of forks which might be regularly merged into the main repository. They are independent and can have incompatible changes and can deviate from the upstream repository.
We use semantic versioning. A short summary: * a.x are stable releases * a.b.x are bugfix/patch releases * a.x must be compatible to version a.0 (i.e. API/Config-compatibility)
If you contribute something, please fork the repository, create a separate branch and use this for pull requests, see section below.
“master” is the stable branch. It hold the latest stable release. Non-developers should only work on this branch. The recommended log level is WARNING. Code is only added by merges from the maintenance branches.
“maintenance/a.b.x” branches accumulate (cherry-picked) patches for a maintenance release (a.b.x). Recommended for experienced users which deploy intelmq themselves. No new features will be added to these branches.
“develop” is the development branch for the next stable release (a.x). New features must go there. Developers may want to work on this branch. This branch also holds all patches from maintenance releases if applicable. The recommended log level is DEBUG.
Separate branches to develop features or bug fixes may be used by any contributor.
Make separate pull requests / branches on GitHub for changes. This allows us to discuss things via GitHub.
We prefer one Pull Request per feature or change. If you have a bunch of small fixes, please don’t create one RP per fix :)
Only very small and changes (docs, …) might be committed directly to development branches without Pull Request by the core-team.
Keep the balance between atomic commits and keeping the amount of commits per PR small. You can use interactive rebasing to squash multiple small commits into one (rebase -i [base-branch]). Only do rebasing if the code you are rebasing is yet not used by others or is already merged - because then others may need to run into conflicts.
Make sure your PR is merge able in the develop branch and all tests are successful.
If possible sign your commits with GPG.
We assume here, that origin is your own fork. We first add the upstream repository:
> git remote add upstream https://github.com/certtools/intelmq.git
Syncing develop:
> git checkout develop
> git pull upstream develop
> git push origin develop
You can do the same with the branches master and maintenance.
Create a separate feature-branch to work on, sync develop with upstream. Create working branch from develop:
> git checkout develop
> git checkout -b bugfix
# your work
> git commit
Or, for bugfixes create a separate bugfix-branch to work on, sync maintenance with upstream. Create working branch from maintenance:
> git checkout maintenance
> git checkout -b new-feature
# your work
> git commit
Getting upstream’s changes for master or any other branch:
> git checkout develop
> git pull upstream develop
> git push origin develop
There are 2 possibilities to get upstream’s commits into your branch. Rebasing and Merging. Using rebasing, your history is rewritten, putting your changes on top of all other commits. You can use this if your changes are not published yet (or only in your fork).
> git checkout bugfix
> git rebase develop
Using the -i flag for rebase enables interactive rebasing. You can then remove, reorder and squash commits, rewrite commit messages, beginning with the given branch, e.g. develop.
Or using merging. This doesn’t break the history. It’s considered more , but also pollutes the history with merge commits.
> git checkout bugfix
> git merge develop
You can then create a PR with your branch bugfix to our upstream repository, using GitHub’s web interface.
If it fixes an existing issue, please use GitHub syntax, e.g.: fixes certtools/intelmq#<IssueID>
If we don’t discuss it, it’s probably not tested.
System Overview¶
In the intelmq/lib/ directory you can find some libraries:
Bots: Defines base structure for bots and handling of startup, stop, messages etc.
Cache: For some expert bots it does make sense to cache external lookup results. Redis is used here.
Harmonization: For defined types, checks and sanitation methods are implemented.
Message: Defines Events and Reports classes, uses harmonization to check validity of keys and values according to config.
Pipeline: Writes messages to message queues. Implemented for productions use is only Redis, AMQP is beta.
Test: Base class for bot tests with predefined test and assert methods.
Utils: Utility functions used by system components.
Code Architecture¶

Pipeline¶
collector bot
TBD
Bot Developer Guide¶
There’s a dummy bot including tests at intelmq/tests/lib/test_parser_bot.py.
Please use the correct bot type as parent class for your bot. The intelmq.lib.bot module contains the classes CollectorBot, ParserBot, ExpertBot and OutputBot.
You can always start any bot directly from command line by calling the executable. The executable will be created during installation a directory for binaries. After adding new bots to the code, install IntelMQ to get the files created. Don’t forget to give an bot id as first argument. Also, running bots with other users than intelmq will raise permission errors.
$ sudo -i intelmq
$ intelmqctl run file-output # if configured
$ intelmq.bots.outputs.file.output file-output
You will get all logging outputs directly on stderr as well as in the log file.
Template¶
Please adjust the doc strings accordingly and remove the in-line comments (#).
"""
SPDX-FileCopyrightText: 2021 Your Name
SPDX-License-Identifier: AGPL-3.0-or-later
Parse data from example.com, be a nice ExampleParserBot.
Document possible necessary configurations.
"""
import sys
# imports for additional libraries and intelmq
from intelmq.lib.bot import ParserBot
class ExampleParserBot(ParserBot):
option1: str = "defaultvalue"
option2: bool = False
def process(self):
report = self.receive_message()
event = self.new_event(report) # copies feed.name, time.observation
... # implement the logic here
event.add('source.ip', '127.0.0.1')
event.add('extra', {"os.name": "Linux"})
if self.option2:
event.add('extra', {"customvalue": self.option1})
self.send_message(event)
self.acknowledge_message()
BOT = ExampleParserBot
Any attributes of the bot that are not private can be set by the user using the IntelMQ configuration settings.
There are some names with special meaning. These can be used i.e. called:
stop: Shuts the bot down.
receive_message, send_message, acknowledge_message: see next section
start: internal method to run the bot
These can be defined:
init: called at startup, use it to set up the bot (initializing classes, loading files etc)
process: processes the messages
shutdown: To Gracefully stop the bot, e.g. terminate connections
All other names can be used freely.
Mixins¶
For common settings and methods you can use mixins from intelmq.lib.mixins
. To use the mixins, just let your bot inherit from the Mixin class (in addition to the inheritance from the Bot class). For example:
class HTTPCollectorBot(CollectorBot, HttpMixin):
The following mixins are available:
HttpMixin
SqlMixin
CacheMixin
The HttpMixin provides the HTTP attributes described in Common parameters and the following methods:
http_get
takes an URL as argument. Any other arguments get passed to therequest.Session.get
method.http_get
returns arequests.Response
.http_session
can be used if you ever want to work with the session object directly. It takes no arguments and returns the bots request.Session.
The SqlMixin provides methods to connect to SQL servers. Inherit this Mixin so that it handles DB connection for you. You do not have to bother:
connecting database in the
self.init()
method, self.cur will be set in the__init__()
catching exceptions, just call
self.execute()
instead ofself.cur.execute()
self.format_char
will be set to ‘%s’ in PostgreSQL and to ‘?’ in SQLite
The CacheMixin provides methods to cache values for bots in a Redis database. It uses the following attributes:
redis_cache_host: str = "127.0.0.1"
redis_cache_port: int = 6379
redis_cache_db: int = 9
redis_cache_ttl: int = 15
redis_cache_password: Optional[str] = None
and provides the methods:
cache_exists
cache_get
cache_set
cache_flush
cache_get_redis_instance
Pipeline interactions¶
We can call three methods related to the pipeline:
self.receive_message(): The pipeline handler pops one message from the internal queue if possible. Otherwise one message from the sources list is popped, and added it to an internal queue. In case of errors in process handling, the message can still be found in the internal queue and is not lost. The bot class unravels the message a creates an instance of the Event or Report class.
self.send_message(event, path=”_default”): Processed message is sent to destination queues. It is possible to change the destination queues by optional path parameter.
self.acknowledge_message(): Message formerly received by receive_message is removed from the internal queue. This should always be done after processing and after the sending of the new message. In case of errors, this function is not called and the message will stay in the internal queue waiting to be processed again.
Logging¶
Log messages have to be clear and well formatted. The format is the following:
Format:
<timestamp> - <bot id> - <log level> - <log message>
Rules:
the Log message MUST follow the common rules of a sentence, beginning with uppercase and ending with period.
the sentence MUST describe the problem or has useful information to give to an inexperienced user a context. Pure stack traces without any further explanation are not helpful.
When the logger instance is created, the bot id must be given as parameter anyway. The function call defines the log level, see below.
debug: Debugging information includes retrieved and sent messages, detailed status information. Can include sensitive information like passwords and amount can be huge.
info: Logs include loaded databases, fetched reports or waiting messages.
warning: Unexpected, but handled behavior.
error: Errors and Exceptions.
critical Program is failing.
Try to keep a balance between obscuring the source code file with hundreds of log messages and having too little log messages.
In general, a bot MUST report error conditions.
The Bot class creates a logger with that should be used by bots. Other components won’t log anyway currently. Examples:
The exception method automatically appends an exception traceback. The logger instance writes by default to the file /opt/intelmq/var/log/[bot-id].log and to stderr.
Parameters for string formatting are better passed as argument to the log function, see https://docs.python.org/3/library/logging.html#logging.Logger.debug In case of formatting problems, the error messages will be better. For example:
Error handling¶
The bot class itself has error handling implemented. The bot itself is allowed to throw exceptions and intended to fail! The bot should fail in case of malicious messages, and in case of unavailable but necessary resources. The bot class handles the exception and will restart until the maximum number of tries is reached and fail then. Additionally, the message in question is dumped to the file /opt/intelmq/var/log/[bot-id].dump and removed from the queue.
Initialization¶
Maybe it is necessary so setup a Cache instance or load a file into memory. Use the init function for this purpose:
Custom configuration checks¶
Every bot can define a static method check(parameters) which will be called by intelmqctl check. For example the check function of the ASNLookupExpert:
Examples¶
Check Expert Bots
Check Parser Bots
Parsers¶
Parsers can use a different, specialized Bot-class. It allows to work on individual elements of a report, splitting the functionality of the parser into multiple functions:
process: getting and sending data, handling of failures etc.
parse: Parses the report and splits it into single elements (e.g. lines). Can be overridden.
parse_line: Parses elements, returns an Event. Can be overridden.
recover_line: In case of failures and for the field raw, this function recovers a fully functional report containing only one element. Can be overridden.
For common cases, like CSV, existing function can be used, reducing the amount of code to implement. In the best case, only parse_line needs to be coded, as only this part interprets the data.
You can have a look at the implementation intelmq/lib/bot.py or at examples, e.g. the DummyBot in intelmq/tests/lib/test_parser_bot.py. This is a stub for creating a new Parser, showing the parameters and possible code:
One line can lead to multiple events, thus parse_line can’t just return one Event. Thus, this function is a generator, which allows to easily return multiple values. Use yield event for valid Events and return in case of a void result (not parsable line, invalid data etc.).
Tests¶
In order to do automated tests on the bot, it is necessary to write tests including sample data. Have a look at some existing tests:
The DummyParserBot in intelmq/tests/lib/test_parser_bot.py. This test has the example data (report and event) inside the file, defined as dictionary.
The parser for malwaregroup at intelmq/tests/bots/parsers/malwaregroup/test_parser_*.py. The latter loads a sample HTML file from the same directory, which is the raw report.
The test for ASNLookupExpertBot has two event tests, one is an expected fail (IPv6).
Ideally an example contains not only the ideal case which should succeed, but also a case where should fail instead. (TODO: Implement assertEventNotEqual or assertEventNotcontainsSubset or similar) Most existing bots are only tested with one message. For newly written test it is appreciable to have tests including more then one message, e.g. a parser fed with an report consisting of multiple events.
When calling the file directly, only the tests in this file for the bot will be expected. Some default tests are always executed (via the test.BotTestCase class), such as pipeline and message checks, logging, bot naming or empty message handling.
See the Testing Pre-releases section about how to run the tests.
Cache¶
Bots can use a Redis database as cache instance. Use the intelmq.lib.utils.Cache class to set this up and/or look at existing bots, like the cymru_whois expert how the cache can be used. Bots must set a TTL for all keys that are cached to avoid caches growing endless over time. Bots must use the Redis databases >= 10, but not those already used by other bots. Look at find intelmq -type f -name ‘*.py’ -exec grep -r ‘redis_cache_db’ {} + to see which databases are already used.
- The databases < 10 are reserved for the IntelMQ core:
2: pipeline
3: statistics
4: tests
Documentation¶
The documentation is automatically published to https://intelmq.readthedocs.io/ at every push to the repository.
To build the documentation you need three packages: - Sphinx - ReCommonMark - sphinx-markdown-tables
To install them, you can use pip:
pip3 install -r docs/requirements.txt
Then use the Makefile to build the documentation using Sphinx:
cd docs
make html
Feeds documentation¶
The feeds which are known to be working with IntelMQ are documented in the machine-readable file intelmq/etc/feeds.yaml. The human-readable documentation is in generated with the Sphinx build as described in the previous section.
Testing Pre-releases¶
Installation¶
The installation procedures need to be adapted only a little bit.
For native packages, you can find the unstable packages of the next version here: Installation Unstable Native Packages. The unstable only has a limited set of packages, so enabling the stable repository can be activated in parallel. For CentOS 8 unstable, the stable repository is required.
For the installation with pip, use the –pre parameter as shown here following command:
pip3 install --pre intelmq
All other steps are not different. Please report any issues you find in our Issue Tracker.
Data Format¶
Contents
Overview¶
In IntelMQ version 3.x+ the internal data format name changed from DHO ( IntelMQ Data Harmonization ) to IDF ( IntelMQ Data Format ). The python module intelmq.lib.harmonization and the configuration file harmonization.conf keep the name harmonization for now. DHO and IDF have the same meaning.
All messages (reports and events) are Python/JSON dictionaries. The key names and according types are defined by the IntelMQ Data Format.
The purpose of this document is to list and clearly define known fields in Abusehelper as well as IntelMQ or similar systems.
A field is a `key=value`
pair. For a clear and unique definition of a field, we must define the key (field-name) as well as the possible values.
A field belongs to an event. An event is basically a structured log record in the form `key=value, key=value, key=value, …`
.
In the List of known fields, each field is grouped by a section. We describe these sections briefly below.
Every event MUST contain a timestamp field.
An IOC (Indicator of compromise) is a single observation like a log line.
Rules for keys¶
The keys can be grouped together in sub-fields, e.g. source.ip or source.geolocation.latitude.
Only the lower-case alphabet, numbers and the underscore are allowed. Further, the field name must not begin with a number.
Thus, keys must match ^[a-z_][a-z_0-9]+(\.[a-z_0-9]+)*$
.
These rules also apply for the otherwise unregulated extra.
namespace.
Sections¶
As stated above, every field is organized under some section. The following is a description of the sections and what they imply.
Feed¶
Fields listed under this grouping list details about the source feed where information came from.
Time¶
The time section lists all fields related to time information. This document requires that all the timestamps MUST be normalized to UTC. If the source reports only a date, do not attempt to invent timestamps.
Source Identity¶
This section lists all fields related to identification of the source. The source is the identity the IoC is about, as opposed to the destination identity, which is another identity.
For examples see the table below.
The abuse type of an event defines the way these events needs to be interpreted. For example, for a botnet drone they refer to the compromised machine, whereas for a command and control server they refer the server itself.
Source Geolocation Identity¶
We recognize that ip geolocation is not an exact science and analysis of the abuse data has shown that different sources attribution sources have different opinions of the geolocation of an ip. This is why we recommend to enrich the data with as many sources as you have available and make the decision which value to use for the cc IOC based on those answers.
Source Local Identity¶
Some sources report an internal (NATed) IP address.
Destination Identity¶
The abuse type of an event defines the way these IOCs needs to be interpreted. For a botnet drone they refer to the compromised machine, whereas for a command and control server they refer the server itself.
Destination Geolocation Identity¶
We recognize that ip geolocation is not an exact science and analysis of the abuse data has shown that different sources attribution sources have different opinions of the geolocation of an ip. This is why we recommend to enrich the data with as many sources as you have available and make the decision which value to use for the cc IOC based on those answers.
Destination Local Identity¶
Some sources report an internal (NATed) IP address.
Extra values¶
Data which does not fit in the format can be saved in the ‘extra’ namespace. All keys must begin with extra., there are no other rules on key names and values. The values can be get/set like all other fields.
Fields List and data types¶
A list of allowed fields and data types can be found in format-fields.
Classification¶
IntelMQ classifies events using three labels: taxonomy, type and identifier. This tuple of three values can be used for deduplication of events and describes what happened.
The taxonomy can be automatically added by the taxonomy expert bot based on the given type. The following classification scheme follows the Reference Security Incident Taxonomy (RSIT):
Taxonomy |
Type |
Description |
---|---|---|
abusive-content |
harmful-speech |
Discreditation or discrimination of somebody, e.g. cyber stalking, racism or threats against one or more individuals. |
abusive content |
spam |
Or ‘Unsolicited Bulk Email’, this means that the recipient has not granted verifiable permission for the message to be sent and that the message is sent as part of a larger collection of messages, all having a functionally comparable content. |
abusive-content |
violence |
Child pornography, glorification of violence, etc. |
availability |
ddos |
Distributed Denial of Service attack, e.g. SYN-Flood or UDP-based reflection/amplification attacks. |
availability |
dos |
Denial of Service attack, e.g. sending specially crafted requests to a web application which causes the application to crash or slow down. |
availability |
misconfiguration |
Software misconfiguration resulting in service availability issues, e.g. DNS server with outdated DNSSEC Root Zone KSK. |
availability |
outage |
Outage caused e.g. by air condition failure or natural disaster. |
availability |
sabotage |
Physical sabotage, e.g cutting wires or malicious arson. |
fraud |
copyright |
Offering or Installing copies of unlicensed commercial software or other copyright protected materials (Warez). |
fraud |
masquerade |
Type of attack in which one entity illegitimately impersonates the identity of another in order to benefit from it. |
fraud |
phishing |
Masquerading as another entity in order to persuade the user to reveal private credentials. |
fraud |
unauthorized-use-of-resources |
Using resources for unauthorized purposes including profit-making ventures, e.g. the use of e-mail to participate in illegal profit chain letters or pyramid schemes. |
information-content-security |
data-leak |
Leaked confidential information like credentials or personal data. |
information-content-security |
data-loss |
Loss of data, e.g. caused by harddisk failure or physical theft. |
information-content-security |
unauthorised-information-access |
Unauthorized access to information, e.g. by abusing stolen login credentials for a system or application, intercepting traffic or gaining access to physical documents. |
information-content-security |
unauthorised-information-modification |
Unauthorised modification of information, e.g. by an attacker abusing stolen login credentials for a system or application or a ransomware encrypting data. |
information-gathering |
scanner |
Attacks that send requests to a system to discover weaknesses. This also includes testing processes to gather information on hosts, services and accounts. Examples: fingerd, DNS querying, ICMP, SMTP (EXPN, RCPT, …), port scanning. |
information-gathering |
sniffing |
Observing and recording of network traffic (wiretapping). |
information-gathering |
social-engineering |
Gathering information from a human being in a non-technical way (e.g. lies, tricks, bribes, or threats). This IOC refers to a resource, which has been observed to perform brute-force attacks over a given application protocol. |
intrusion-attempts |
brute-force |
Multiple login attempts (Guessing / cracking of passwords, brute force). |
intrusion-attempts |
exploit |
An attack using an unknown exploit. |
intrusion-attempts |
ids-alert |
IOCs based on a sensor network. This is a generic IOC denomination, should it be difficult to reliably denote the exact type of activity involved for example due to an anecdotal nature of the rule that triggered the alert. |
intrusions |
application-compromise |
Compromise of an application by exploiting (un)known software vulnerabilities, e.g. SQL injection. |
intrusions |
burglary |
Physical intrusion, e.g. into corporate building or data center. |
intrusions |
privileged-account-compromise |
Compromise of a system where the attacker gained administrative privileges. |
intrusions |
system-compromise |
Compromise of a system, e.g. unauthorised logins or commands. This includes compromising attempts on honeypot systems. |
intrusions |
unprivileged-account-compromise |
Compromise of a system using an unprivileged (user/service) account. |
malicious-code |
c2-server |
This is a command and control server in charge of a given number of botnet drones. |
malicious-code |
infected-system |
This is a compromised machine, which has been observed to make a connection to a command and control server. |
malicious-code |
malware-configuration |
This is a resource which updates botnet drones with a new configuration. |
malicious-code |
malware-distribution |
URI used for malware distribution, e.g. a download URL included in fake invoice malware spam. |
other |
blacklist |
Some sources provide blacklists, which clearly refer to abusive behavior, such as spamming, but fail to denote the exact reason why a given identity has been blacklisted. The reason may be that the justification is anecdotal or missing entirely. This type should only be used if the typing fits the definition of a blacklist, but an event specific denomination is not possible for one reason or another. Not in RSIT. |
other |
dga-domain |
DGA Domains are seen various families of malware that are used to periodically generate a large number of domain names that can be used as rendezvous points with their command and control servers. Not in RSIT. |
other |
other |
All incidents which don’t fit in one of the given categories should be put into this class. |
other |
malware |
An IoC referring to a malware (sample) itself. Not in RSIT. |
other |
proxy |
This refers to the use of proxies from inside your network. Not in RSIT. |
test |
test |
Meant for testing. Not in RSIT. |
other |
tor |
This IOC refers to incidents related to TOR network infrastructure. Not in RSIT. |
other |
undetermined |
The categorisation of the incident is unknown/undetermined. |
vulnerable |
ddos-amplifier |
Publicly accessible services that can be abused for conducting DDoS reflection/amplification attacks, e.g. DNS open-resolvers or NTP servers with monlist enabled. |
vulnerable |
information-disclosure |
Publicly accessible services potentially disclosing sensitive information, e.g. SNMP or Redis. |
vulnerable |
potentially-unwanted-accessible |
Potentially unwanted publicly accessible services, e.g. Telnet, RDP or VNC. |
vulnerable |
vulnerable-system |
A system which is vulnerable to certain attacks. Example: misconfigured client proxy settings (example: WPAD), outdated operating system version, etc. |
vulnerable |
weak-crypto |
Publicly accessible services offering weak crypto, e.g. web servers susceptible to POODLE/FREAK attacks. |
In the “other” taxonomy, several types are not in the RSIT, but this taxonomy is intentionally extensible.
Meaning of source and destination identities¶
Meaning of source and destination identities for each classification type and possible classification.identifier
meanings and usages. The identifier is often a normalized malware name, grouping many variants or the affected network protocol.
Examples of the meaning of the source and destination fields for each classification type and possible identifiers are shown here. Usually the main information is in the source fields. The identifier is often a normalized malware name, grouping many variants.
Type |
Source |
Destination |
Possible identifiers |
---|---|---|---|
blacklist |
blacklisted device |
||
brute-force |
attacker |
target |
|
c2-server |
(sinkholed) c&c server |
zeus, palevo, feodo |
|
ddos |
attacker |
target |
|
dga-domain |
infected device |
||
dropzone |
server hosting stolen data |
||
exploit |
hosting server |
||
ids-alert |
triggering device |
||
infected-system |
infected device |
contacted c2c server |
|
malware |
infected device |
zeus, palevo, feodo |
|
malware configuration |
infected device |
||
malware-distribution |
server hosting malware |
||
phishing |
phishing website |
||
proxy |
server allowing policy and security bypass |
||
scanner |
scanning device |
scanned device |
http,modbus,wordpress |
spam |
infected device |
targeted server |
|
system-compromise |
server |
||
vulnerable-system |
vulnerable device |
heartbleed, openresolver, snmp, wpad |
Field in italics is the interesting one for CERTs.
Example:
If you know of an IP address that connects to a zeus c&c server, it’s about the infected device, thus classification.taxonomy is malicious-code, classification.type is infected-system and the classification.identifier is zeus. If you want to complain about the c&c server, the event’s classification.type is c2server. The malware.name can have the full name, eg. zeus_p2p.
Minimum recommended requirements for events¶
Below, we have enumerated the minimum recommended requirements for an actionable abuse event. These keys should to be present for the abuse report to make sense for the end recipient. Please note that if you choose to anonymize your sources, you can substitute feed with feed.code and that only one of the identity keys ip, domain name, url, email address must be present. All the rest of the keys are optional.
Category |
Key |
Terminology |
---|---|---|
Feed |
feed.name |
Should |
Classification |
classification.type |
Should |
Classification |
classification.taxonomy |
Should |
Time |
time.source |
Should |
Time |
time.observation |
Should |
Identity |
source.ip |
Should* |
Identity |
source.fqdn |
Should* |
Identity |
source.url |
Should* |
Identity |
source.account |
Should* |
only one of them
This list of required fields is not enforced by IntelMQ.
NOTE: This document was copied from AbuseHelper repository (now Arctic Security Public documents and improved.
Harmonization field names¶
Section |
Name |
Type |
Description |
---|---|---|---|
Classification |
classification.identifier |
The lowercase identifier defines the actual software or service (e.g. |
|
Classification |
classification.taxonomy |
We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The European CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check ENISA taxonomies. |
|
Classification |
classification.type |
The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid type explosion, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example. |
|
comment |
Free text commentary about the abuse event inserted by an analyst. |
||
Destination |
destination.abuse_contact |
Abuse contact for destination address. A comma separated list. |
|
Destination |
destination.account |
An account name or email address, which has been identified to relate to the destination of an abuse event. |
|
Destination |
destination.allocated |
Allocation date corresponding to BGP prefix. |
|
Destination |
destination.as_name |
The autonomous system name to which the connection headed. |
|
Destination |
destination.asn |
The autonomous system number to which the connection headed. |
|
Destination |
destination.domain_suffix |
The suffix of the domain from the public suffix list. |
|
Destination |
destination.fqdn |
A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters. |
|
Destination Geolocation |
destination.geolocation.cc |
Country-Code according to ISO3166-1 alpha-2 for the destination IP. |
|
Destination Geolocation |
destination.geolocation.city |
Some geolocation services refer to city-level geolocation. |
|
Destination Geolocation |
destination.geolocation.country |
The country name derived from the ISO3166 country code (assigned to cc field). |
|
Destination Geolocation |
destination.geolocation.latitude |
Latitude coordinates derived from a geolocation service, such as MaxMind geoip db. |
|
Destination Geolocation |
destination.geolocation.longitude |
Longitude coordinates derived from a geolocation service, such as MaxMind geoip db. |
|
Destination Geolocation |
destination.geolocation.region |
Some geolocation services refer to region-level geolocation. |
|
Destination Geolocation |
destination.geolocation.state |
Some geolocation services refer to state-level geolocation. |
|
Destination |
destination.ip |
The IP which is the target of the observed connections. |
|
Destination |
destination.local_hostname |
Some sources report an internal hostname within a NAT related to the name configured for a compromised system |
|
Destination |
destination.local_ip |
Some sources report an internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here. |
|
Destination |
destination.network |
CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific. |
|
Destination |
destination.port |
The port to which the connection headed. |
|
Destination |
destination.registry |
The IP registry a given ip address is allocated by. |
|
Destination |
destination.reverse_dns |
Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters. |
|
Destination |
destination.tor_node |
If the destination IP was a known tor node. |
|
Destination |
destination.url |
A URL denotes on IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource. |
|
Destination |
destination.urlpath |
The path portion of an HTTP or related network request. |
|
Event_Description |
event_description.target |
Some sources denominate the target (organization) of a an attack. |
|
Event_Description |
event_description.text |
A free-form textual description of an abuse event. |
|
Event_Description |
event_description.url |
A description URL is a link to a further description of the the abuse event in question. |
|
event_hash |
Computed event hash with specific keys and values that identify a unique event. At present, the hash should default to using the SHA1 function. Please note that for an event hash to be able to match more than one event (deduplication) the receiver of an event should calculate it based on a minimal set of keys and values present in the event. Using for example the observation time in the calculation will most likely render the checksum useless for deduplication purposes. |
||
extra |
All anecdotal information, which cannot be parsed into the data harmonization elements. E.g. os.name, os.version, etc. Note: this is only intended for mapping any fields which can not map naturally into the data harmonization. It is not intended for extending the data harmonization with your own fields. |
||
Feed |
feed.accuracy |
A float between 0 and 100 that represents how accurate the data in the feed is |
|
Feed |
feed.code |
Code name for the feed, e.g. DFGS, HSDAG etc. |
|
Feed |
feed.documentation |
A URL or hint where to find the documentation of this feed. |
|
Feed |
feed.name |
Name for the feed, usually found in collector bot configuration. |
|
Feed |
feed.provider |
Name for the provider of the feed, usually found in collector bot configuration. |
|
Feed |
feed.url |
The URL of a given abuse feed, where applicable |
|
Malware Hash |
malware.hash.md5 |
A string depicting an MD5 checksum for a file, be it a malware sample for example. |
|
Malware Hash |
malware.hash.sha1 |
A string depicting a SHA1 checksum for a file, be it a malware sample for example. |
|
Malware Hash |
malware.hash.sha256 |
A string depicting a SHA256 checksum for a file, be it a malware sample for example. |
|
Malware |
malware.name |
The malware name in lower case. |
|
Malware |
malware.version |
A version string for an identified artifact generation, e.g. a crime-ware kit. |
|
Misp |
misp.attribute_uuid |
MISP - Malware Information Sharing Platform & Threat Sharing UUID of an attribute. |
|
Misp |
misp.event_uuid |
MISP - Malware Information Sharing Platform & Threat Sharing UUID. |
|
output |
Event data converted into foreign format, intended to be exported by output plugin. |
||
Protocol |
protocol.application |
e.g. vnc, ssh, sip, irc, http or smtp. |
|
Protocol |
protocol.transport |
e.g. tcp, udp, icmp. |
|
raw |
The original line of the event from encoded in base64. |
||
rtir_id |
Request Tracker Incident Response ticket id. |
||
screenshot_url |
Some source may report URLs related to a an image generated of a resource without any metadata. Or an URL pointing to resource, which has been rendered into a webshot, e.g. a PNG image and the relevant metadata related to its retrieval/generation. |
||
Source |
source.abuse_contact |
Abuse contact for source address. A comma separated list. |
|
Source |
source.account |
An account name or email address, which has been identified to relate to the source of an abuse event. |
|
Source |
source.allocated |
Allocation date corresponding to BGP prefix. |
|
Source |
source.as_name |
The autonomous system name from which the connection originated. |
|
Source |
source.asn |
The autonomous system number from which originated the connection. |
|
Source |
source.domain_suffix |
The suffix of the domain from the public suffix list. |
|
Source |
source.fqdn |
A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters. |
|
Source Geolocation |
source.geolocation.cc |
Country-Code according to ISO3166-1 alpha-2 for the source IP. |
|
Source Geolocation |
source.geolocation.city |
Some geolocation services refer to city-level geolocation. |
|
Source Geolocation |
source.geolocation.country |
The country name derived from the ISO3166 country code (assigned to cc field). |
|
Source Geolocation |
source.geolocation.cymru_cc |
The country code denoted for the ip by the Team Cymru asn to ip mapping service. |
|
Source Geolocation |
source.geolocation.geoip_cc |
MaxMind Country Code (ISO3166-1 alpha-2). |
|
Source Geolocation |
source.geolocation.latitude |
Latitude coordinates derived from a geolocation service, such as MaxMind geoip db. |
|
Source Geolocation |
source.geolocation.longitude |
Longitude coordinates derived from a geolocation service, such as MaxMind geoip db. |
|
Source Geolocation |
source.geolocation.region |
Some geolocation services refer to region-level geolocation. |
|
Source Geolocation |
source.geolocation.state |
Some geolocation services refer to state-level geolocation. |
|
Source |
source.ip |
The ip observed to initiate the connection |
|
Source |
source.local_hostname |
Some sources report a internal hostname within a NAT related to the name configured for a compromised system |
|
Source |
source.local_ip |
Some sources report a internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here. |
|
Source |
source.network |
CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific. |
|
Source |
source.port |
The port from which the connection originated. |
|
Source |
source.registry |
The IP registry a given ip address is allocated by. |
|
Source |
source.reverse_dns |
Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters. |
|
Source |
source.tor_node |
If the source IP was a known tor node. |
|
Source |
source.url |
A URL denotes an IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource. |
|
Source |
source.urlpath |
The path portion of an HTTP or related network request. |
|
status |
Status of the malicious resource (phishing, dropzone, etc), e.g. online, offline. |
||
Time |
time.observation |
The time the collector of the local instance processed (observed) the event. |
|
Time |
time.source |
The time of occurrence of the event as reported the feed (source). |
|
tlp |
Traffic Light Protocol level of the event. |
Harmonization types¶
ASN¶
ASN type. Derived from Integer with forbidden values.
Only valid are: 0 < asn <= 4294967295 See https://en.wikipedia.org/wiki/Autonomous_system_(Internet) > The first and last ASNs of the original 16-bit integers, namely 0 and > 65,535, and the last ASN of the 32-bit numbers, namely 4,294,967,295 are > reserved and should not be used by operators.
Accuracy¶
Accuracy type. A Float between 0 and 100.
Base64¶
Base64 type. Always gives unicode strings.
Sanitation encodes to base64 and accepts binary and unicode strings.
Boolean¶
Boolean type. Without sanitation only python bool is accepted.
Sanitation accepts string ‘true’ and ‘false’ and integers 0 and 1.
ClassificationTaxonomy¶
classification.taxonomy type.
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/
- These old values are automatically mapped to the new ones:
‘abusive content’ -> ‘abusive-content’ ‘information gathering’ -> ‘information-gathering’ ‘intrusion attempts’ -> ‘intrusion-attempts’ ‘malicious code’ -> ‘malicious-code’
- Allowed values are:
abusive-content
availability
fraud
information-content-security
information-gathering
intrusion-attempts
intrusions
malicious-code
other
test
vulnerable
ClassificationType¶
classification.type type.
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/ with extensions.
- These old values are automatically mapped to the new ones:
‘botnet drone’ -> ‘infected-system’ ‘ids alert’ -> ‘ids-alert’ ‘c&c’ -> ‘c2-server’ ‘c2server’ -> ‘c2-server’ ‘infected system’ -> ‘infected-system’ ‘malware configuration’ -> ‘malware-configuration’ ‘Unauthorised-information-access’ -> ‘unauthorised-information-access’ ‘leak’ -> ‘data-leak’ ‘vulnerable client’ -> ‘vulnerable-system’ ‘vulnerable service’ -> ‘vulnerable-system’ ‘ransomware’ -> ‘infected-system’ ‘unknown’ -> ‘undetermined’
- These values changed their taxonomy:
- ‘malware’: In terms of the taxonomy ‘malicious-code’ they can be either ‘infected-system’ or ‘malware-distribution’
but in terms of malware actually, it is now taxonomy ‘other’
- Allowed values are:
application-compromise
blacklist
brute-force
burglary
c2-server
copyright
data-leak
data-loss
ddos
ddos-amplifier
dga-domain
dos
exploit
harmful-speech
ids-alert
infected-system
information-disclosure
malware
malware-configuration
malware-distribution
masquerade
misconfiguration
other
outage
phishing
potentially-unwanted-accessible
privileged-account-compromise
proxy
sabotage
scanner
sniffing
social-engineering
spam
system-compromise
test
tor
unauthorised-information-access
unauthorised-information-modification
unauthorized-use-of-resources
undetermined
unprivileged-account-compromise
violence
vulnerable-system
weak-crypto
DateTime¶
Date and time type for timestamps.
Valid values are timestamps with time zone and in the format ‘%Y-%m-%dT%H:%M:%S+00:00’. Invalid are missing times and missing timezone information (UTC). Microseconds are also allowed.
Sanitation normalizes the timezone to UTC, which is the only allowed timezone.
The following additional conversions are available with the convert function:
timestamp
windows_nt: From Windows NT / AD / LDAP
epoch_millis: From Milliseconds since Epoch
from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’
from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’
utc_isoformat: Parse date generated by datetime.isoformat()
fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given
FQDN¶
Fully qualified domain name type.
All valid lowercase domains are accepted, no IP addresses or URLs. Trailing dot is not allowed.
To prevent values like ‘10.0.0.1:8080’ (#1235), we check for the non-existence of ‘:’.
Float¶
Float type. Without sanitation only python float/integer/long is accepted. Boolean is explicitly denied.
Sanitation accepts strings and everything float() accepts.
IPAddress¶
Type for IP addresses, all families. Uses the ipaddress module.
Sanitation accepts integers, strings and objects of ipaddress.IPv4Address and ipaddress.IPv6Address.
Valid values are only strings. 0.0.0.0 is explicitly not allowed.
IPNetwork¶
Type for IP networks, all families. Uses the ipaddress module.
Sanitation accepts strings and objects of ipaddress.IPv4Network and ipaddress.IPv6Network. If host bits in strings are set, they will be ignored (e.g 127.0.0.1/32).
Valid values are only strings.
Integer¶
Integer type. Without sanitation only python integer/long is accepted. Bool is explicitly denied.
Sanitation accepts strings and everything int() accepts.
JSON¶
JSON type.
Sanitation accepts any valid JSON objects.
Valid values are only unicode strings with JSON objects.
JSONDict¶
JSONDict type.
Sanitation accepts pythons dictionaries and JSON strings.
Valid values are only unicode strings with JSON dictionaries.
LowercaseString¶
Like string, but only allows lower case characters.
Sanitation lowers all characters.
Registry¶
Registry type. Derived from UppercaseString.
Only valid values: AFRINIC, APNIC, ARIN, LACNIC, RIPE. RIPE-NCC and RIPENCC are normalized to RIPE.
String¶
Any non-empty string without leading or trailing whitespace.
TLP¶
TLP level type. Derived from UppercaseString.
Only valid values: WHITE, GREEN, AMBER, RED.
Accepted for sanitation are different cases and the prefix ‘tlp:’.
URL¶
URI type. Local and remote.
Sanitation converts hxxp and hxxps to http and https. For local URIs (file) a missing host is replaced by localhost.
Valid values must have the host (network location part).
UppercaseString¶
Like string, but only allows upper case characters.
Sanitation uppers all characters.
Release procedure¶
Contents
General assumption: You are working on branch maintenance, the next version is a bug fix release. For feature releases it is slightly different.
Check before¶
Make sure the current state is really final ;) You can test most of the steps described here locally before doing it real.
Check the upgrade functions in intelmq/lib/upgrades.py.
Close the milestone on GitHub and move any open issues to the next one.
docs/user/installation.rst: Update supported operating systems.
Documentation¶
These apply to all projects:
CHANGELOG.MD and
NEWS.MD: Update the latest header, fix the order, remove empty sections and (re)group the entries if necessary.
debian/changelog
: Insert a new section for the new version with the tooldch
or update the version of the existing last item if yet unreleased. Don’t forget the revision after the version number!
IntelMQ¶
intelmq/version.py
: Update the version.
Eventually adapt the default log levels if necessary. Should be INFO for stable releases.
IntelMQ API¶
intelmq_api/version.py
: Update the version.
IntelMQ Manager¶
intelmq_manager/version.py
: Update the version.
intelmq_manager/static/js/about.js
: Update the version.
Commit, push, review and merge¶
Commit your changes in a separate branch, the final commit message should start with REL:
. Push and create a pull request to maintenance and after that from maintenance to master. Someone else should review the changes. Eventually fix them, make sure the REL:
is the last commit, you can also push that one at last, after the reviews.
Why a separate branch? Because if problems show up, you can still force-push to that one, keeping the release commit the latest one.
Tag and release¶
Tag the commit with git tag -s version HEAD
, merge it into master, push the branches and the tag. The tag is just a.b.c
, not prefixed with v
(that was necessary only with SVN a long time ago…).
Go to https://github.com/certtools/intelmq/tags and enter the release notes (from the CHANGELOG) for the new tag, then it’s considered a release by GitHub.
Tarballs and PyPI¶
Build the source and binary (wheel) distribution:
rm -r build/
python3 setup.py sdist bdist_wheel
Upload the files including signatures to PyPI with e.g. twine: twine upload -s dist/intelmq…
Packages¶
We are currently using the public Open Build Service instance of openSUSE: http://build.opensuse.org/project/show/home:sebix:intelmq
First, test all the steps first with the unstable-repository and check that at least installations succeed.
Create the tarballs with the script create-archives.sh.
Update the dsc and spec files for new filenames and versions.
Update the .changes file
Build locally for all distributions.
Commit.
Docker Image¶
Releasing a new Docker image is very easy.
Clone IntelMQ Docker Repository with
git clone https://github.com/certat/intelmq-docker.git --recursive
as this repository contains submodulesIf the
intelmq-docker
repository is not updated yet, use git pull –recurse-submodules to pull the latest changes from their respective repository.Run
./build.sh
, check your console if the build was successful.Run
./test.sh
- It will run nosetests3 with the exotic flag. All errors/warnings will be displayed.Change the
build_version
inpublish.sh
to the new version you want to release.Change the
namespace
variable in publish.sh.If no error/warning was shown, you can release with
./publish.sh
.Update the DockerHub ReadMe and add the latest version.
Commit and push the updates to the
intelmq-docker
repository``
Announcements¶
Announce the new version at the mailinglists intelmq-users, intelmq-dev. For bigger releases, probably also at IHAP, Twitter, etc. Ask your favorite social media consultant.
Prepare new version¶
Increase the version in intelmq/version.py and declare it as alpha version. Add the new version in intelmq/lib/upgrades.py. Add a new entry in debian/changelog with dch -v [version] -c debian/changelog.
Add new entries to CHANGELOG.md and NEWS.md.
IntelMQ¶
For CHANGELOG.md
:
### Configuration
### Core
### Development
### Data Format
### Bots
#### Collectors
#### Parsers
#### Experts
#### Outputs
### Documentation
### Packaging
### Tests
### Tools
### Contrib
### Known issues
And for NEWS.md
:
### Requirements
### Tools
### Data Format
### Configuration
### Libraries
### Postgres databases
IntelMQ API¶
An empty section of CHANGELOG.rst
.
IntelMQ Manager¶
For CHANGELOG.md
:
### Pages
#### Landing page
#### Configuration
#### Management
#### Monitor
#### Check
### Documentation
### Third-party libraries
### Packaging
### Known issues
And an empty section in the NEWS.md
file.
Feeds wishlist¶
This is a list with various feeds, which are either currently not supported or the usage is not clearly documented in IntelMQ.
If you want to contribute documenting how to configure existing bots in order to collect new feeds or by creating new parsers, here is a list of potentially interesting feeds. See Feeds documentation for more information on this.
This list evolved from the issue Contribute: Feeds List (#384).
Lists of feeds:
Some third party intelmq bots: NRDCS’ IntelMQ fork
List of potentially interesting data sources:
Maltrail:
Mass Scanners (for whitelisting)
Phishstats, offers JSON (“API) and CSV download.
RST Threat Feed (offers a free and a commercial feed)
intelmq¶
intelmq package¶
Subpackages¶
intelmq.bin package¶
Generates a MISP object template see https://github.com/MISP/misp-objects/
Generates a SQL command file with commands to create the events table.
Reads the harmonization configuration and generates an SQL command from it. The SQL file is saved in /tmp/initdb.sql or a temporary name if the other one exists.
- intelmq.bin.intelmq_psql_initdb.generate(harmonization_file='/opt/intelmq/etc/harmonization.conf')¶
- intelmq.bin.intelmq_psql_initdb.main()¶
- class intelmq.bin.intelmqctl.IntelMQController(interactive: bool = False, returntype: ReturnType = ReturnType.PYTHON, quiet: bool = False, no_file_logging: bool = False, drop_privileges: bool = True)¶
Bases:
object
- __init__(interactive: bool = False, returntype: ReturnType = ReturnType.PYTHON, quiet: bool = False, no_file_logging: bool = False, drop_privileges: bool = True) None ¶
Initializes intelmqctl.
- Parameters
interactive – for cli-interface true, functions can exits, parameters are used
return_type –
ReturnType.PYTHON (*) – no special treatment, can be used for use by other python code
ReturnType.TEXT (*) – user-friendly output for cli, default for interactive use
ReturnType.JSON (*) – machine-readable output for managers
quiet – False by default, can be activated for cron jobs etc.
no_file_logging – do not log to the log file
drop_privileges – Drop privileges and fail if it did not work.
- abort(message)¶
- bot_disable(bot_id)¶
If Bot is already disabled, the “Bot … is disabled” message is printed by the wrapping function already.
- bot_enable(bot_id)¶
- bot_reload(bot_id, getstatus=True, group=None)¶
- bot_restart(bot_id, group=None)¶
- bot_run(**kwargs)¶
- bot_start(bot_id, getstatus=True, group=None)¶
- bot_status(bot_id, group=None)¶
- bot_stop(bot_id, getstatus=True, group=None)¶
- botnet_reload(group=None)¶
- botnet_restart(group=None)¶
- botnet_start(group=None)¶
- botnet_status(group=None)¶
- botnet_stop(group=None)¶
- check(no_connections=False, check_executables=True)¶
- clear_queue(queue)¶
Clears an exiting queue.
First checks if the queue does exist in the pipeline configuration.
- debug(sections=None)¶
Give debugging output
- get_queues(with_internal_queues=False)¶
- Returns
4-tuple of source, destination, internal queues, and all queues combined.
The returned values are only queue names, not their paths. I.E. if there is a bot with destination queues = {“_default”: “one”, “other”: [“two”, “three”]}, only set of {“one”, “two”, “three”} gets returned. (Note that the “_default” path has single string and the “other” path has a list that gets flattened.)
- list(kind=None, non_zero=False, count=False, configured=False)¶
- list_bots(non_zero=False, configured=False)¶
Lists all (configured) bots from runtime configuration or generated on demand with bot id/module and description and parameters.
If description is not set, None is used instead.
- list_queues(non_zero=False, count=False)¶
- load_defaults_configuration(silent=False)¶
- log_bot_message(status, *args)¶
- log_botnet_message(status, group=None)¶
- log_log_messages(messages)¶
- read_bot_log(bot_id, log_level, number_of_lines)¶
- run()¶
- upgrade_conf(previous=None, dry_run=None, function=None, force=None, state_file: str = '/opt/intelmq/var/lib/state.json', no_backup=False)¶
Upgrade the IntelMQ configuration after a version upgrade.
- Parameters
previous – Assume the given version as the previous version
function – Only execute this upgrade function
force – Also upgrade if not necessary
state_file – location of the state file
no_backup – Do not create backups of state and configuration files
state_file:
version_history = [..., [2, 0, 0], [2, 0, 1]] upgrades = { "v112_feodo_tracker_domains": true, "v112_feodo_tracker_ips": false, "v200beta1_ripe_expert": false } results = [ {"function": "v112_feodo_tracker_domains", "success": true, "retval": null, "time": "..."}, {"function": "v112_feodo_tracker_domains", "success": false, "retval": "fix it manually", "message": "fix it manually", "time": "..."}, {"function": "v200beta1_ripe_expert", "success": false, "traceback": "...", "time": "..."} ]
- write_updated_runtime_config(filename='/opt/intelmq/etc/runtime.yaml')¶
- class intelmq.bin.intelmqctl.Parameters¶
Bases:
object
- intelmq.bin.intelmqctl.main()¶
- class intelmq.bin.intelmqdump.Completer(possible_values, queues=False)¶
Bases:
object
- complete(text, state)¶
- queues = None¶
- state = None¶
- intelmq.bin.intelmqdump.dump_info(fname, file_descriptor=None)¶
- intelmq.bin.intelmqdump.load_meta(dump)¶
- intelmq.bin.intelmqdump.main(argv=None)¶
- intelmq.bin.intelmqdump.save_file(handle, content)¶
© 2019-2021 nic.at GmbH <intelmq-team@cert.at>
SPDX-License-Identifier: AGPL-3.0-or-later
- Sets up an intelmq environment after installation or upgrade by
creating needed directories
set intelmq as owner for those
providing example configuration files if not already existing
- If intelmq-api is installed, the similar steps are performed:
creates needed directories
sets the webserver as group for them
sets group write permissions
Reasoning: Pip does not (and cannot) create /opt/intelmq/user-given ROOT_DIR, as described in https://github.com/certtools/intelmq/issues/819
- intelmq.bin.intelmqsetup.basic_checks(skip_ownership)¶
- intelmq.bin.intelmqsetup.change_owner(file: str, owner: Optional[str] = None, group: Optional[str] = None, log: bool = True)¶
- intelmq.bin.intelmqsetup.create_directory(directory: str, octal_mode: int)¶
- intelmq.bin.intelmqsetup.debian_activate_apache_config(config_name: str)¶
- intelmq.bin.intelmqsetup.find_webserver_configuration_directory()¶
- intelmq.bin.intelmqsetup.find_webserver_user()¶
- intelmq.bin.intelmqsetup.intelmqsetup_api(ownership: bool = True, webserver_user: Optional[str] = None)¶
- intelmq.bin.intelmqsetup.intelmqsetup_api_webserver_configuration(webserver_configuration_directory: Optional[str] = None)¶
- intelmq.bin.intelmqsetup.intelmqsetup_core(ownership=True, state_file='/opt/intelmq/var/lib/state.json')¶
- intelmq.bin.intelmqsetup.intelmqsetup_manager_generate()¶
- intelmq.bin.intelmqsetup.intelmqsetup_manager_webserver_configuration(webserver_configuration_directory: Optional[str] = None)¶
- intelmq.bin.intelmqsetup.main()¶
- intelmq.bin.rewrite_config_files.rewrite(fobj)¶
intelmq.bots package¶
Reference: https://abusix.com/contactdb.html RIPE abuse contacts resolving through DNS TXT queries
- class intelmq.bots.experts.abusix.expert.AbusixExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Add abuse contact information from the Abusix online service for source and destination IP address
- init()¶
- process()¶
- intelmq.bots.experts.abusix.expert.BOT¶
alias of
AbusixExpertBot
Aggregate Expert
SPDX-FileCopyrightText: 2021 Intelmq Team <intelmq-team@cert.at> SPDX-License-Identifier: AGPL-3.0-or-later
- class intelmq.bots.experts.aggregate.expert.AggregateExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
,CacheMixin
Aggregation expert bot
- cleanup()¶
- fields: str = 'classification.type, classification.identifier'¶
- init()¶
- process()¶
- redis_cache_db: int = 8¶
- threshold: int = 10¶
- timespan: str = '1 hour'¶
- intelmq.bots.experts.aggregate.expert.BOT¶
alias of
AggregateExpertBot
- class intelmq.bots.experts.asn_lookup.expert.ASNLookupExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Add ASN and netmask information from a local BGP dump
- autoupdate_cached_database: bool = True¶
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- database = None¶
- init()¶
- process()¶
- classmethod run(parsed_args=None)¶
- classmethod update_database(verbose=False)¶
- intelmq.bots.experts.asn_lookup.expert.BOT¶
alias of
ASNLookupExpertBot
- intelmq.bots.experts.csv_converter.expert.BOT¶
alias of
CSVConverterExpertBot
- class intelmq.bots.experts.csv_converter.expert.CSVConverterExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Convert data to CSV
- delimiter: str = ','¶
- fieldnames: str = 'time.source,classification.type,source.ip'¶
- init()¶
- process()¶
- intelmq.bots.experts.cymru_whois.expert.BOT¶
alias of
CymruExpertBot
- class intelmq.bots.experts.cymru_whois.expert.CymruExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
,CacheMixin
Add ASN, netmask, AS name, country, registry and allocation time from the Cymru Whois DNS service
- overwrite = False¶
- process()¶
- redis_cache_db: int = 5¶
- redis_cache_host: str = '127.0.0.1'¶
- redis_cache_password: str = None¶
- redis_cache_port: int = 6379¶
- redis_cache_ttl: int = 86400¶
Deduplicator expert bot
- param redis_cache_host
string
- param redis_cache_port
int
- param redis_cache_db
int
- param redis_cache_ttl
int
- param redis_cache_password
string. default: {None}
- param filter_type
string [“blacklist”, “whitelist”]
- param bypass
boolean default: False
- param filter_keys
string with multiple keys separated by comma. Please note that time.observation key is never consider by the system because system will always ignore this key.
- intelmq.bots.experts.deduplicator.expert.BOT¶
alias of
DeduplicatorExpertBot
- class intelmq.bots.experts.deduplicator.expert.DeduplicatorExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
,CacheMixin
Detection and drop exact duplicate messages. Message hashes are cached in the Redis database
- bypass = False¶
- filter_keys: str = None¶
- filter_type: str = 'blacklist'¶
- init()¶
- process()¶
- redis_cache_db: int = 6¶
- redis_cache_host: str = '127.0.0.1'¶
- redis_cache_password: str = None¶
- redis_cache_port: int = 6379¶
- redis_cache_ttl: int = 86400¶
As the frontend reverse-proxies the (backend) API a “502 Bad Gateway” status code is treated the same as a timeout, i.e. will be retried instead of a fail.
- intelmq.bots.experts.do_portal.expert.BOT¶
alias of
DoPortalExpertBot
- class intelmq.bots.experts.do_portal.expert.DoPortalExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Retrieve abuse contact information for the source IP address from a do-portal instance
- init()¶
- mode: str = 'append'¶
- portal_api_key: str = None¶
- portal_url: str = None¶
- process()¶
The library publicsuffixlist will be used if installed, otherwise our own internal fallback is used.
- intelmq.bots.experts.domain_suffix.expert.BOT¶
alias of
DomainSuffixExpertBot
- class intelmq.bots.experts.domain_suffix.expert.DomainSuffixExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Extract the domain suffix from a domain and save it in the the domain_suffix field. Requires a local file with valid domain suffixes
- autoupdate_cached_database: bool = True¶
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- field: str = None¶
- init()¶
- process()¶
- classmethod run(parsed_args=None)¶
- suffix_file: str = None¶
- classmethod update_database(verbose=False)¶
Domain validator
SPDX-FileCopyrightText: 2021 Marius Karotkis <marius.karotkis@gmail.com> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.experts.domain_valid.expert.BOT¶
alias of
DomainValidExpertBot
- class intelmq.bots.experts.domain_valid.expert.DomainValidExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
- domain_field: str = 'source.fqdn'¶
- get_tlds_domain_list()¶
- init()¶
- process()¶
- classmethod run(parsed_args=None)¶
- tlds_domains_list: str = '/opt/intelmq/var/lib/bots/domain_valid/tlds-alpha-by-domain.txt'¶
- classmethod update_database(verbose=False)¶
Reducer bot
- intelmq.bots.experts.field_reducer.expert.BOT¶
alias of
FieldReducerExpertBot
- intelmq.bots.experts.filter.expert.BOT¶
alias of
FilterExpertBot
- class intelmq.bots.experts.filter.expert.FilterExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Filter events, supports named paths for splitting the message flow
- doFilter(event, key, condition)¶
- equalsFilter(event, key, value)¶
- filter_action: str = None¶
- filter_key: str = None¶
- filter_regex: str = None¶
- filter_value: str = None¶
- init()¶
- not_after = None¶
- not_before = None¶
- parse_timeattr(time_attr)¶
Parses relative or absolute time specification, decides how to parse by checking if the string contains any timespan identifier.
See also https://github.com/certtools/intelmq/issues/1523 dateutil.parser.parse detects strings like 10 hours as absolute time.
- process()¶
- regexSearchFilter(event, key)¶
- intelmq.bots.experts.format_field.expert.BOT¶
alias of
FormatFieldExpertBot
- class intelmq.bots.experts.format_field.expert.FormatFieldExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Perform string method operations on column values
- init()¶
- new_value = ''¶
- old_value = ''¶
- process()¶
- replace_column = ''¶
- replace_count = 1¶
- split_column = None¶
- split_separator = ','¶
- strip_chars = ' '¶
- strip_columns = ''¶
Generic DB Lookup
- intelmq.bots.experts.generic_db_lookup.expert.BOT¶
alias of
GenericDBLookupExpertBot
- class intelmq.bots.experts.generic_db_lookup.expert.GenericDBLookupExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
-
Fetche data from a database
- database: str = 'intelmq'¶
- engine: str = '<postgresql OR sqlite>'¶
- host: str = 'localhost'¶
- init()¶
- match_fields = {'source.asn': 'asn'}¶
- overwrite: bool = False¶
- password: str = '<password>'¶
- port: int = 5432¶
- process()¶
- replace_fields = {'contact': 'source.abuse_contact', 'note': 'comment'}¶
- sslmode: str = 'require'¶
- table: str = 'contacts'¶
- user: str = 'intelmq'¶
Uses https://pypi.org/project/geolib/ https://github.com/joyanujoy/geolib
- intelmq.bots.experts.geohash.expert.BOT¶
alias of
GeohashExpertBot
- class intelmq.bots.experts.geohash.expert.GeohashExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Compute the geohash from longitude/latitude information, save it to extra.(source|destination)
- init()¶
- overwrite: bool = False¶
- precision: int = 7¶
- process()¶
These are all possible gaierrors according to the source: http://www.castaglia.org/proftpd/doc/devel-guide/src/lib/glibc-gai_strerror.c.html
# define EAI_BADFLAGS -1 /* Invalid value for `ai_flags' field. */
# define EAI_NONAME -2 /* NAME or SERVICE is unknown. */
# define EAI_AGAIN -3 /* Temporary failure in name resolution. */
# define EAI_FAIL -4 /* Non-recoverable failure in name res. */
# define EAI_NODATA -5 /* No address associated with NAME. */
# define EAI_FAMILY -6 /* `ai_family' not supported. */
# define EAI_SOCKTYPE -7 /* `ai_socktype' not supported. */
# define EAI_SERVICE -8 /* SERVICE not supported for `ai_socktype'. */
# define EAI_ADDRFAMILY -9 /* Address family for NAME not supported. */
# define EAI_MEMORY -10 /* Memory allocation failure. */
# define EAI_SYSTEM -11 /* System error returned in `errno'. */
We treat some of them as valid (ie record does not exist) and other as temporary or permanent failure (default).
- intelmq.bots.experts.gethostbyname.expert.BOT¶
alias of
GethostbynameExpertBot
- class intelmq.bots.experts.gethostbyname.expert.GethostbynameExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Resolve the IP address for the FQDN
- fallback_to_url: bool = True¶
- gaierrors_to_ignore: Tuple[int] = ()¶
- init()¶
- overwrite: bool = False¶
- process()¶
HTTP Content Expert Bot
SPDX-FileCopyrightText: 2021 Birger Schacht <schacht@cert.at> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.experts.http.expert_content.BOT¶
alias of
HttpContentExpertBot
- class intelmq.bots.experts.http.expert_content.HttpContentExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Test if a given string is part of the content for a given URL
Parameters¶ - field: str
The name of the field containing the URL to be checked (defaults to ‘source.url’).
- needle: str
The string that the content available on URL is checked for.
- overwrite:
Specifies if an existing ‘status’ value should be overwritten.
- field: str = 'source.url'¶
- init()¶
- needle: str = None¶
- overwrite: bool = True¶
- process()¶
HTTP Status Expert Bot
SPDX-FileCopyrightText: 2021 Birger Schacht <schacht@cert.at> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.experts.http.expert_status.BOT¶
alias of
HttpStatusExpertBot
- class intelmq.bots.experts.http.expert_status.HttpStatusExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Fetch the HTTP Status for a given URL
- Parameters
field (str) – The name of the field containing the URL to be checked (defaults to ‘source.url’).
success_status_codes (List) – A list of success status codes. If this parameter is omitted or the list is empty, successful status codes are the ones between 200 and 400.
overwrite (bool) – Specifies if an existing ‘status’ value should be overwritten.
- field: str = 'source.url'¶
- overwrite: bool = True¶
- process()¶
- success_status_codes: List[int] = []¶
IDEA classification: https://idea.cesnet.cz/en/classifications
- intelmq.bots.experts.idea.expert.BOT¶
alias of
IdeaExpertBot
- class intelmq.bots.experts.idea.expert.IdeaExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Convert events into the IDEA format
- TYPE_TO_CATEGORY = {'application-compromise': 'Intrusion.AppCompromise', 'blacklist': 'Other', 'brute-force': 'Attempt.Login', 'burglary': 'Intrusion', 'c2-server': 'Intrusion.Botnet', 'copyright': 'Fraud.Copyright', 'data-leak': 'Information', 'data-loss': 'Information', 'ddos': 'Availability.DDoS', 'ddos-amplifier': 'Intrusion.Botnet', 'dga-domain': 'Anomaly.Behaviour', 'dos': 'Availability.DoS', 'exploit': 'Attempt.Exploit', 'harmful-speech': 'Abusive.Harassment', 'ids-alert': 'Attempt.Exploit', 'infected-system': 'Malware', 'information-disclosure': 'Information.UnauthorizedAccess', 'malware': 'Malware', 'malware-configuration': 'Malware', 'malware-distribution': 'Malware', 'masquerade': 'Fraud.Scam', 'misconfiguration': 'Availability.Outage', 'other': 'Other', 'outage': 'Availability.Outage', 'phishing': 'Fraud.Phishing', 'potentially-unwanted-accessible': 'Vulnerable.Open', 'privileged-account-compromise': 'Intrusion.AdminCompromise', 'proxy': 'Vulnerable.Config', 'sabotage': 'Availability.Sabotage', 'scanner': 'Recon.Scanning', 'sniffing': 'Recon.Sniffing', 'social-engineering': 'Recon.SocialEngineering', 'spam': 'Abusive.Spam', 'system-compromise': 'Intrusion.AdminCompromise', 'test': 'Test', 'tor': 'Other', 'unauthorised-information-access': 'Information.UnauthorizedAccess', 'unauthorised-information-modification': 'Information.UnauthorizedModification', 'unauthorized-use-of-resources': 'Fraud.UnauthorizedUsage', 'undetermined': 'Other', 'unprivileged-account-compromise': 'Intrusion.UserCompromise', 'violence': 'Abusive.Violence', 'vulnerable-system': 'Vulnerable.Config', 'weak-crypto': 'Vulnerable.Config'}¶
- TYPE_TO_SOURCE_TYPE = {'c2-server': 'CC', 'dga-domain': 'DGA', 'malware-configuration': 'MalwareConf', 'malware-distribution': 'Malware', 'phishing': 'Phishing', 'proxy': 'Proxy', 'tor': 'Tor'}¶
- get_value(src, value)¶
- init()¶
- process()¶
- process_dict(src, description)¶
- process_list(src, description)¶
- test_mode: bool = False¶
- intelmq.bots.experts.idea.expert.addr4(s)¶
- intelmq.bots.experts.idea.expert.addr6(s)¶
- intelmq.bots.experts.idea.expert.quot(s)¶
- intelmq.bots.experts.jinja.expert.BOT¶
alias of
JinjaExpertBot
- class intelmq.bots.experts.jinja.expert.JinjaExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
Bot
Modify the message using the Jinja templating engine .. rubric:: Example
- fields:
output: The provider is {{ msg[‘feed.provider’] }}! feed.url: “{{ msg[‘feed.url’] | upper }}” extra.somejinjaoutput: file:///etc/intelmq/somejinjatemplate.j2
- fields: Dict[str, Union[str, Template]] = {}¶
- init()¶
- overwrite: bool = False¶
- process()¶
- intelmq.bots.experts.lookyloo.expert.BOT¶
alias of
LookyLooExpertBot
This product includes GeoLite2 data created by MaxMind, available from <a href=”http://www.maxmind.com”>http://www.maxmind.com</a>.
- intelmq.bots.experts.maxmind_geoip.expert.BOT¶
alias of
GeoIPExpertBot
- class intelmq.bots.experts.maxmind_geoip.expert.GeoIPExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Add geolocation information from a local MaxMind database to events (country, city, longitude, latitude)
- autoupdate_cached_database: bool = True¶
- database: str = '/opt/intelmq/var/lib/bots/maxmind_geoip/GeoLite2-City.mmdb'¶
- init()¶
- license_key: str = '<insert Maxmind license key>'¶
- overwrite: bool = False¶
- process()¶
- classmethod run(parsed_args=None)¶
- classmethod update_database(verbose=False)¶
- use_registered: bool = False¶
MARExpertBot queries environment for occurrences of IOCs via McAfee Active Response.
Parameter: dxl_config_file: string lookup_type: string
- intelmq.bots.experts.mcafee.expert_mar.BOT¶
alias of
MARExpertBot
- class intelmq.bots.experts.mcafee.expert_mar.MARExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Query connections to IP addresses to the given destination within the local environment using McAfee Active Response queries
- MAR_Query(mar_search_str)¶
- QUERY = {'DestFQDN': [{'name': 'DNSCache', 'output': 'hostname', 'op': 'EQUALS', 'value': '%(destination.fqdn)s'}], 'DestIP': [{'name': 'NetworkFlow', 'output': 'dst_ip', 'op': 'EQUALS', 'value': '%(destination.ip)s'}], 'DestSocket': [{'name': 'NetworkFlow', 'output': 'dst_ip', 'op': 'EQUALS', 'value': '%(destination.ip)s'}, {'name': 'NetworkFlow', 'output': 'dst_port', 'op': 'EQUALS', 'value': '%(destination.port)s'}], 'Hash': [{'name': 'Files', 'output': 'md5', 'op': 'EQUALS', 'value': '%(malware.hash.md5)s'}, {'name': 'Files', 'output': 'sha1', 'op': 'EQUALS', 'value': '%(malware.hash.sha1)s'}, {'name': 'Files', 'output': 'sha256', 'op': 'EQUALS', 'value': '%(malware.hash.sha256)s'}]}¶
- dxl_config_file: str = '<insert /path/to/dxlclient.config>'¶
- init()¶
- lookup_type: str = '<Hash|DestSocket|DestIP|DestFQDN>'¶
- process()¶
An expert to for looking up values in MISP.
- param - misp_url
URL of the MISP server
- param - misp_key
API key for accessing MISP
- param - http_verify_cert
true or false, check the validity of the certificate
- intelmq.bots.experts.misp.expert.BOT¶
alias of
MISPExpertBot
- class intelmq.bots.experts.misp.expert.MISPExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Looking up the IP address in MISP instance and retrieve attribute and event UUIDs
- init()¶
- misp_key: str = '<insert MISP Authkey>'¶
- misp_url: str = "<insert url of MISP server (with trailing '/')>"¶
- process()¶
Modify Expert bot let’s you manipulate all fields with a config file.
- intelmq.bots.experts.modify.expert.BOT¶
alias of
ModifyExpertBot
- class intelmq.bots.experts.modify.expert.MatchGroupMapping(match)¶
Bases:
object
Wrapper for a regexp match object with a dict-like interface. With this, we can access the match groups from within a format replacement field.
- class intelmq.bots.experts.modify.expert.ModifyExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Perform arbitrary changes to event’s fields based on regular-expression-based rules on different values. See the bot’s documentation for some examples
- apply_action(event, action, matches)¶
- case_sensitive: bool = True¶
- configuration_path: str = '/opt/intelmq/var/lib/bots/modify/modify.conf'¶
- init()¶
- matches(identifier, event, condition)¶
- maximum_matches = None¶
- overwrite: bool = True¶
- process()¶
- intelmq.bots.experts.modify.expert.is_re_pattern(value)¶
Checks if the given value is a re compiled pattern
CERT.at geolocate the national CERT abuse service https://contacts.cert.at/cgi-bin/abuse-nationalcert.pl
HTTP GET: https://contacts.cert.at/cgi-bin/abuse-nationalcert.pl?ip=1.2.3.4 HTTP POST: https://contacts.cert.at/cgi-bin/abuse-nationalcert.pl
Options: &bShowNationalCERT=on Show national CERT contact info &bShowHeader=on Display a CSV header &bVerbose=on Display the source of the data, and other information &bFilter=off Act as a filter: only show lines which geolocate to “AT” &bKeepLoglines=off Keep original log lines (separated by “#”) &sep={TAB, comma, semicolon, pipe} Separator for the (output) CSV format
- intelmq.bots.experts.national_cert_contact_certat.expert.BOT¶
alias of
NationalCERTContactCertATExpertBot
- class intelmq.bots.experts.national_cert_contact_certat.expert.NationalCERTContactCertATExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Add country and abuse contact information from the CERT.at national CERT Contact Database. Set filter to true if you want to filter out events for Austria. Set overwrite_cc to true if you want to overwrite an existing country code value
- filter: bool = False¶
- http_verify_cert: bool = True¶
- init()¶
- overwrite_cc: bool = False¶
- process()¶
- intelmq.bots.experts.rdap.expert.BOT¶
alias of
RDAPExpertBot
- class intelmq.bots.experts.rdap.expert.RDAPExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
,CacheMixin
Get RDAP data
- init()¶
- overwrite: bool = True¶
- parse_entities(vcardArray) list ¶
- process()¶
- rdap_bootstrapped_servers: dict = {}¶
- rdap_order: list = ['abuse', 'technical', 'administrative', 'registrant', 'registrar']¶
- redis_cache_db: int = 8¶
- redis_cache_host: str = '127.0.0.1'¶
- redis_cache_password: str = None¶
- redis_cache_port: int = 6379¶
- redis_cache_ttl: int = 86400¶
See README for database download.
- intelmq.bots.experts.recordedfuture_iprisk.expert.BOT¶
alias of
RecordedFutureIPRiskExpertBot
- class intelmq.bots.experts.recordedfuture_iprisk.expert.RecordedFutureIPRiskExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Adds the Risk Score from RecordedFuture IPRisk associated with source.ip or destination.ip with a local database
- api_token: str = '<insert Recorded Future IPRisk API token>'¶
- autoupdate_cached_database: bool = True¶
- database: str = '/opt/intelmq/var/lib/bots/recordedfuture_iprisk/rfiprisk.dat'¶
- init()¶
- overwrite: bool = False¶
- process()¶
- classmethod run(parsed_args=None)¶
- classmethod update_database(verbose=False)¶
Remove Affix
SPDX-FileCopyrightText: 2021 Marius Karotkis <marius.karotkis@gmail.com> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.experts.remove_affix.expert.BOT¶
alias of
RemoveAffixExpertBot
- class intelmq.bots.experts.remove_affix.expert.RemoveAffixExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
Bot
- affix: str = 'www.'¶
- field: str = 'source.fqdn'¶
- process()¶
- remove_prefix: bool = True¶
- removeprefix(field: str, prefix: str) str ¶
- removesuffix(field: str, suffix: str) str ¶
- intelmq.bots.experts.reverse_dns.expert.BOT¶
alias of
ReverseDnsExpertBot
- exception intelmq.bots.experts.reverse_dns.expert.InvalidPTRResult¶
Bases:
ValueError
- class intelmq.bots.experts.reverse_dns.expert.ReverseDnsExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
,CacheMixin
Get the correspondent domain name for source and destination IP address
- cache_ttl_invalid_response: int = 60¶
- overwrite: bool = False¶
- process()¶
- redis_cache_db: int = 7¶
- redis_cache_host: str = '127.0.0.1'¶
- redis_cache_password: str = None¶
- redis_cache_port: int = 6379¶
- redis_cache_ttl: int = 86400¶
- RFC 1918 Will Drop Local IP from a given record and a bit more.
It checks for RFC1918 IPv4 Hosts It checks for localhost, multicast and test LANs It checks for Link Local and Documentation LAN in IPv6 It checks for RFC538 ASNs
Need only to feed the parameter “fields” to set the name of the field parameter designed to be filtered out. Several parameters could be used, separated by “,” It could sanitize the whole records with the “drop” parameter set to “yes”
Sources: https://tools.ietf.org/html/rfc1918 https://tools.ietf.org/html/rfc2606 https://tools.ietf.org/html/rfc3849 https://tools.ietf.org/html/rfc4291 https://tools.ietf.org/html/rfc5737 https://en.wikipedia.org/wiki/IPv4 https://en.wikipedia.org/wiki/Autonomous_system_(Internet)
- intelmq.bots.experts.rfc1918.expert.BOT¶
alias of
RFC1918ExpertBot
- class intelmq.bots.experts.rfc1918.expert.RFC1918ExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Removes fields or discard events if an IP address or domain is invalid as defined in standards like RFC 1918 (invalid, local, reserved, documentation). IP address, FQDN and URL fields are supported
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- fields: str = 'destination.ip,source.ip,source.url'¶
- init()¶
- is_in_domains(value)¶
- is_in_net(ip)¶
- is_subdomain(value)¶
- policy: str = 'del,drop,drop'¶
- process()¶
Reference: https://stat.ripe.net/docs/data_api https://github.com/RIPE-NCC/whois/wiki/WHOIS-REST-API-abuse-contact
- intelmq.bots.experts.ripe.expert.BOT¶
alias of
RIPEExpertBot
- class intelmq.bots.experts.ripe.expert.RIPEExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
,CacheMixin
Fetch abuse contact and/or geolocation information for the source and/or destination IP addresses and/or ASNs of the events
- GEOLOCATION_REPLY_TO_INTERNAL = {('cc', 'country'), ('city', 'city'), ('latitude', 'latitude'), ('longitude', 'longitude')}¶
- QUERY = {'db_asn': 'https://rest.db.ripe.net/abuse-contact/as{}.json', 'db_ip': 'https://rest.db.ripe.net/abuse-contact/{}.json', 'stat': 'https://stat.ripe.net/data/abuse-contact-finder/data.json?resource={}', 'stat_geolocation': 'https://stat.ripe.net/data/maxmind-geo-lite/data.json?resource={}'}¶
- REPLY_TO_DATA = {'db_asn': <function RIPEExpertBot.<lambda>>, 'db_ip': <function RIPEExpertBot.<lambda>>, 'stat': <function RIPEExpertBot.<lambda>>, 'stat_geolocation': <function RIPEExpertBot.<lambda>>}¶
- init()¶
- mode: str = 'append'¶
- process()¶
- query_ripe_db_asn: bool = True¶
- query_ripe_db_ip: bool = True¶
- query_ripe_stat_asn: bool = True¶
- query_ripe_stat_geolocation: bool = True¶
- query_ripe_stat_ip: bool = True¶
- redis_cache_db: int = 10¶
- redis_cache_host: str = '127.0.0.1'¶
- redis_cache_password: str = None¶
- redis_cache_port: int = 6379¶
- redis_cache_ttl: int = 86400¶
- intelmq.bots.experts.ripe.expert.clean_geo(geo_data)¶
Clean RIPE reply specifics for geolocation query
- intelmq.bots.experts.ripe.expert.clean_string(s)¶
Clean RIPE reply specifics for splittable string replies
SieveExpertBot filters and modifies events based on a specification language similar to mail sieve.
- param file
string
- intelmq.bots.experts.sieve.expert.BOT¶
alias of
SieveExpertBot
- class intelmq.bots.experts.sieve.expert.Procedure(value)¶
Bases:
Enum
An enumeration.
- CONTINUE = 1¶
- DROP = 3¶
- KEEP = 2¶
- class intelmq.bots.experts.sieve.expert.SieveExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Filter and modify events based on a sieve-based language
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- compute_basic_math(action, event) str ¶
- file: str = '/opt/intelmq/var/lib/bots/sieve/filter.sieve'¶
- static get_linecol(model_obj, as_dict=False)¶
Gets the position of a model object in the sieve file.
- Parameters
model_obj – the model object
as_dict – return the position as a dict instead of a tuple.
- Returns
Returns the line and column number for the model object’s position in the sieve file. Default return type is a tuple of (line,col). Optionally, returns a dict when as_dict == True.
- init() None ¶
- static init_metamodel()¶
- match_expression(expr, event) bool ¶
- process() None ¶
- process_bool_match(key, op, value, event)¶
- process_condition(cond, event) bool ¶
- process_conjunction(conj, event) bool ¶
- static process_exist_match(key, op, event) bool ¶
- process_ip_range_match(key, ip_range, event) bool ¶
- process_list_match(key, op, value, event) bool ¶
- process_multi_numeric_match(key, op, value, event) bool ¶
- process_multi_string_match(key, op, value, event) bool ¶
- process_single_numeric_match(key, op, value, event) bool ¶
- process_single_string_match(key, op, value, event) bool ¶
- process_statement(statement, event)¶
- static read_sieve_file(filename, metamodel)¶
- static validate_ip_address(ipaddr) None ¶
- static validate_ip_range(ip_range) None ¶
- static validate_numeric_match(num_match) None ¶
Validates a numeric match expression.
Checks if the event key (given on the left hand side of the expression) is of a valid type for a numeric match, according the the IntelMQ harmonization.
- Raises
TextXSemanticError – when the key is of an incompatible type for numeric match expressions.
- static validate_string_match(str_match) None ¶
Validates a string match expression.
Checks if the type of the value given on the right hand side of the expression matches the event key in the left hand side, according to the IntelMQ harmonization.
- Raises
TextXSemanticError – when the value is of incompatible type with the event key.
Splunk saved search enrichment export bot
SPDX-FileCopyrightText: 2020 Linköping University <https://liu.se/> SPDX-License-Identifier: AGPL-3.0-or-later
Searches Splunk for fields in an event and adds search results to it.
This bot is quite slow, since it needs to submit a search job to Splunk, get the job ID, poll for the job to complete and then retrieve the results. If you have a high query load, run more instances of the bot.
- param Generic IntelMQ HTTP parameters
- param auth_token
string, Splunk authentication token
- param url
string, base URL of the Splunk REST API
- param retry_interval
integer, optional, default 5, number of seconds to wait between polling for search results to be available
- param saved_search
string, name of Splunk saved search to run
- param search_parameters
map string->string, optional, default {}, IntelMQ event fields to Splunk saved search parameters
- param result_fields
map string->string, optional, default {}, Splunk search result fields to IntelMQ event fields
- param not_found
list of strings, default [ “warn”, “send” ], what to do if the search returns zero results. All specified actions are performed. Any reasonable combination of: warn: log a warning message send: send the event on unmodified drop: drop the message
- param multiple_result_handling
list of strings, default [ “warn”, “use_first”, “send” ], what to do if the search returns more than one result. All specified actions are performed. Any reasonable combination of: limit: limit the search so that duplicates are impossible warn: log a warning message use_first: use the first search result ignore: do not modify the event send: send the event on drop: drop the message
- param overwrite
bool or null, optional, default null, whether search results replace existing values in the event. If null, trying to set an existing field raises intelmq.exceptions.KeyExists.
- intelmq.bots.experts.splunk_saved_search.expert.BOT¶
alias of
SplunkSavedSearchBot
- class intelmq.bots.experts.splunk_saved_search.expert.SplunkSavedSearchBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Enrich an event from Splunk search results
- auth_token: str = None¶
- init()¶
- multiple_result_handling = ['warn', 'use_first', 'send']¶
- not_found = ['warn', 'send']¶
- overwrite = None¶
- process()¶
- result_fields = {'result field': 'event field'}¶
- retry_interval: int = 5¶
- saved_search: str = None¶
- search_parameters = {'event field': 'search parameter'}¶
- update_event(event, search_result)¶
- url: str = None¶
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/ with extensions.
- intelmq.bots.experts.taxonomy.expert.BOT¶
alias of
TaxonomyExpertBot
Threshold value expert bot
SPDX-FileCopyrightText: 2020 Linköping University <https://liu.se/> SPDX-License-Identifier: AGPL-3.0-or-later
Given a stream of messages, this bot will let through only the single one that makes the count of similar messages go above a threshold value.
This bot is not multiprocessing safe. Do not run more than one instance on the same Redis cache database.
- param redis_cache_host
string
- param redis_cache_port
int
- param redis_cache_db
int
- param redis_cache_password
string. default: {None}
- param redis_cache_ttl
int, number of seconds to keep counts of similar messages.
- param filter_type
string [“whitelist”, “blacklist”], when determining whether two messages are similar, consider either only the named fields, or all but the named fields (time.observation is always ignored).
- param bypass
boolean default: False
- param filter_keys
list of strings, keys to exclude or include when determining whether messages are similar. time.observation is always ignored.
- param threshold
int, number of messages after which one is sent on. As long as the count is above the threshold, no new messages will be sent.
- param add_keys
optional, array of strings to strings, keys to add to forwarded messages. Regardless of this setting, the field “extra.count” will be set to the number of messages seen (which will be the threshold value).
- intelmq.bots.experts.threshold.expert.BOT¶
alias of
ThresholdExpertBot
- class intelmq.bots.experts.threshold.expert.ThresholdExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
,CacheMixin
Check if the number of similar messages during a specified time interval exceeds a set value
- add_keys: dict = {'comment': 'Threshold reached'}¶
- bypass = False¶
- filter_keys: Iterable = ['raw', 'time.observation']¶
- filter_type: str = 'blacklist'¶
- init()¶
- process()¶
- redis_cache_db: int = 11¶
- redis_cache_ttl: int = 3600¶
- threshold: int = 100¶
See README for database download.
- intelmq.bots.experts.tor_nodes.expert.BOT¶
alias of
TorExpertBot
- class intelmq.bots.experts.tor_nodes.expert.TorExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Check if the IP address is a Tor Exit Node based on a local database of TOR nodes
- autoupdate_cached_database: bool = True¶
- database: str = '/opt/intelmq/var/lib/bots/tor_nodes/tor_nodes.dat'¶
- init()¶
- overwrite: bool = False¶
- process()¶
- classmethod run(parsed_args=None)¶
- classmethod update_database(verbose=False)¶
Cut string if length is bigger than max
SPDX-FileCopyrightText: 2021 Marius Karotkis <marius.karotkis@gmail.com> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.experts.truncate_by_delimiter.expert.BOT¶
alias of
TruncateByDelimiterExpertBot
Trusted Introducer Expert
SPDX-FileCopyrightText: 2021 Intelmq Team <intelmq-team@cert.at> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.experts.trusted_introducer_lookup.expert.BOT¶
alias of
TrustedIntroducerLookupExpertBot
© 2021 Sebastian Wagner <wagner@cert.at>
SPDX-License-Identifier: AGPL-3.0-or-later
https://gitlab.com/intevation/tuency/tuency/-/blob/master/backend/docs/IntelMQ-API.md
Example query: > curl -s -H “Authorization: Bearer XXX” ‘https://tuency-demo1.example.com/intelmq/lookup?classification_taxonomy=availability&classification_type=backdoor &feed_provider=Team+Cymru&feed_name=FTP&feed_status=production&ip=123.123.123.23’ same for domain= a query can contain both ip address and domain
Example response: {“ip”:{“destinations”:[{“source”:”portal”,”name”:”Thurner”,”contacts”:[{“email”:”test@example.com”}]}]},”suppress”:true,”interval”:{“unit”:”days”,”length”:1}} {“ip”:{“destinations”:[{“source”:”portal”,”name”:”Thurner”,”contacts”:[{“email”:”test@example.vom”}]}]},”domain”:{“destinations”:[{“source”:”portal”,”name”:”Thurner”,”contacts”:[{“email”:”abuse@example.at”}]}]},”suppress”:true,”interval”:{“unit”:”immediate”,”length”:1}}
- intelmq.bots.experts.tuency.expert.BOT¶
alias of
TuencyExpertBot
- intelmq.bots.experts.url.expert.BOT¶
alias of
URLExpertBot
- class intelmq.bots.experts.url.expert.URLExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Extract additional information for the URL.
Possibly fills the following fields: “source.fqdn”, “source.ip”, “source.port”, “source.urlpath”, “source.account”, “destination.fqdn”, “destination.ip”, “destination.port”, “destination.urlpath”, “destination.account”, “protocol.application”, “protocol.transport”
Fields “protocol.application” and “protocol.transport” are preferred from source.url.
- init()¶
- overwrite: bool = False¶
- process()¶
- skip_fields: Optional[List[str]] = None¶
- intelmq.bots.experts.url2fqdn.expert.BOT¶
alias of
Url2fqdnExpertBot
- class intelmq.bots.experts.url2fqdn.expert.Url2fqdnExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Parse the FQDN from the URL
- static check(parameters: dict) Optional[List[List[str]]] ¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- init()¶
- overwrite = False¶
- process()¶
- intelmq.bots.experts.uwhoisd.expert.BOT¶
alias of
UniversalWhoisExpertBot
- class intelmq.bots.experts.uwhoisd.expert.UniversalWhoisExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Universal Whois expert bot get the whois entry related an a domain, hostname, IP address, or ASN from a centralised uWhoisd instance
- port: int = 4243¶
- process()¶
- server: str = 'localhost'¶
Created on Tue Jan 23 15:25:58 2018
@author: sebastian
- intelmq.bots.experts.wait.expert.BOT¶
alias of
WaitExpertBot
- class intelmq.bots.experts.wait.expert.WaitExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ExpertBot
Wait for a some time or until a queue size is lower than a given number
- connect_redis()¶
- init()¶
- process()¶
- queue_db: int = 2¶
- queue_host: str = 'localhost'¶
- queue_name: str = None¶
- queue_password: str = None¶
- queue_polling_interval: float = 0.05¶
- queue_port: int = 6379¶
- queue_size: int = 0¶
- sleep_time: int = None¶
- class intelmq.bots.outputs.amqptopic.output.AMQPTopicOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send events to an AMQP topic exchange. Requires the pika python library
- connect_server()¶
- connection_attempts: int = 3¶
- connection_heartbeat: int = 3600¶
- connection_host: str = '127.0.0.1'¶
- connection_port: int = 5672¶
- connection_vhost: str = None¶
- content_type: str = 'application/json'¶
- delivery_mode: int = 2¶
- exchange_durable: bool = True¶
- exchange_name: str = None¶
- exchange_type: str = 'topic'¶
- format_routing_key: bool = False¶
- init()¶
- keep_raw_field: bool = False¶
- message_hierarchical_output: bool = False¶
- message_jsondict_as_string: bool = False¶
- message_with_type: bool = False¶
- password: str = None¶
- process()¶
Stop the Bot if cannot connect to AMQP Server after the defined connection attempts
- require_confirmation: bool = True¶
- routing_key: str = None¶
- shutdown()¶
- single_key: bool = False¶
- use_ssl = False¶
- username = None¶
- intelmq.bots.outputs.amqptopic.output.BOT¶
alias of
AMQPTopicOutputBot
- intelmq.bots.outputs.blackhole.output.BOT¶
alias of
BlackholeOutputBot
Bro file output
SPDX-FileCopyrightText: 2021 Marius Karotkis <marius.karotkis@gmail.com> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.outputs.bro_file.output.BOT¶
alias of
BroFileOutputBot
- class intelmq.bots.outputs.bro_file.output.BroFileOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
- add_bro_header()¶
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- encoding_errors_mode = 'strict'¶
- file: str = '/opt/intelmq/var/lib/bots/file-output/bro'¶
- format_filename: bool = False¶
- hierarchical_output: bool = False¶
- init()¶
- is_multithreadable = False¶
- keep_raw_field: bool = False¶
- message_jsondict_as_string: bool = False¶
- message_with_type: bool = False¶
- open_file(filename: Optional[str] = None)¶
- process()¶
- shutdown()¶
- single_key: bool = False¶
The ES-connection can’t be closed explicitly.
TODO * Support client_cert and client_key parameters, see https://github.com/certtools/intelmq/pull/1406
- intelmq.bots.outputs.elasticsearch.output.BOT¶
alias of
ElasticsearchOutputBot
- class intelmq.bots.outputs.elasticsearch.output.ElasticsearchOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send events to an Elasticsearch database server
- elastic_host: str = '127.0.0.1'¶
- elastic_index: str = 'intelmq'¶
- elastic_port: int = 9200¶
- flatten_fields = ['extra']¶
- get_index(event_dict: dict, default_date: Optional[date] = None, default_string: str = 'unknown-date') str ¶
- Returns the index name to use for the given event,
based on the current bot’s settings and the event’s date fields. - If the bot should rotate its Elasticsearch index, returns elastic_index-<timestamp> based on the bot’s rotation option and the time fields in the event, e.g. intelmq-2018. - If the bot should rotate its Elasticsearch index, but no time information is available in the event, this will return <elastic_index>-<default>, e.g. intelmq-unknown-date. - If the bot should not rotate indices, returns elastic_index, e.g. intelmq.
- Parameters
event_dict – The event (as a dict) to examine.
default_date – (Optional) The default date to use for events with no time information (e.g. datetime.today()). Default: None.
default_string – (Optional) The value to append if no time is available in the event. Default: ‘unknown-date’.
- Returns
A string containing the name of the index which should store the event.
- http_password: str = None¶
- http_username: str = None¶
- http_verify_cert: bool = False¶
- init()¶
- process()¶
- replacement_char = None¶
- rotate_index: str = 'never'¶
- should_rotate()¶
- ssl_ca_certificate: str = None¶
- ssl_show_warnings: bool = True¶
- use_ssl: bool = False¶
- intelmq.bots.outputs.elasticsearch.output.get_event_date(event_dict: dict) date ¶
- intelmq.bots.outputs.elasticsearch.output.replace_keys(obj, key_char='.', replacement='_')¶
- intelmq.bots.outputs.file.output.BOT¶
alias of
FileOutputBot
- class intelmq.bots.outputs.file.output.FileOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Write events to a file
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- encoding_errors_mode = 'strict'¶
- file: str = '/opt/intelmq/var/lib/bots/file-output/events.txt'¶
- format_filename: bool = False¶
- hierarchical_output: bool = False¶
- init()¶
- keep_raw_field: bool = False¶
- message_jsondict_as_string: bool = False¶
- message_with_type: bool = False¶
- open_file(filename: Optional[str] = None)¶
- process()¶
- shutdown()¶
- single_key: bool = False¶
- intelmq.bots.outputs.files.output.BOT¶
alias of
FilesOutputBot
- class intelmq.bots.outputs.files.output.FilesOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Write events lockfree into separate files
- _get_new_name(fd=None)¶
Creates unique filename (Maildir inspired)
- create_unique_file()¶
Safely creates machine-wide uniquely named file in tmp dir.
- dir: str = '/opt/intelmq/var/lib/bots/files-output/incoming'¶
- hierarchical_output: bool = False¶
- init()¶
- keep_raw_field: bool = False¶
- message_jsondict_as_string: bool = False¶
- message_with_type: bool = False¶
- process()¶
- single_key: bool = False¶
- suffix: str = '.json'¶
- tmp: str = '/opt/intelmq/var/lib/bots/files-output/tmp'¶
ESMOutputBot connects to McAfee Enterprise Security Manager, and updates IP based watchlists
Parameters: esm_ip: IP Address of ESM esm_user: username to connect to ESM esm_password: Password of esm_user esm_watchlist: Destination watchlist to update field: field from IntelMQ message to extract (e.g. destination.ip)
- intelmq.bots.outputs.mcafee.output_esm_ip.BOT¶
alias of
ESMIPOutputBot
- class intelmq.bots.outputs.mcafee.output_esm_ip.ESMIPOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Write events to the McAfee Enterprise Security Manager (ESM)
IntelMQ-Bot-Name: McAfee ESM IP
- esm_ip: str = '1.2.3.4'¶
- esm_password: str = None¶
- esm_user: str = 'NGCP'¶
- esm_watchlist: str = None¶
- field: str = 'source.ip'¶
- init()¶
- process()¶
Connect to a MISP instance and add event as MISPObject if not there already.
SPDX-FileCopyrightText: 2020 Intevation GmbH <https://intevation.de> SPDX-License-Identifier: AGPL-3.0-or-later
Funding: of initial version by SUNET Author(s): * Bernhard Reiter <bernhard@intevation.de>
A shortened copy of this documentation is kept at docs/user/bots.rst, please keep it current, when changing something.
- param - add_feed_provider_as_tag
bool (use true when in doubt)
- param - add_feed_name_as_as_tag
bool (use true when in doubt)
- param - misp_additional_correlation_fields
list of fields for which the correlation flags will be enabled (in addition to those which are in significant_fields)
- param - misp_additional_tags
list of tags to set not be searched for when looking for duplicates
- param - misp_key
str, API key for accessing MISP
- param - misp_publish
bool, if a new MISP event should be set to “publish”. Expert setting as MISP may really make it “public”! (Use false when in doubt.)
- param - misp_tag_for_bot
str, used to mark MISP events
- param - misp_to_ids_fields
list of fields for which the to_ids flags will be set
- param - misp_url
str, URL of the MISP server
- param - significant_fields
list of intelmq field names
The significant_fields values will be searched for in all MISP attribute values and if all values are found in the one MISP event, no new MISP event will be created. (The reason that all values are matched without considering the attribute type is a technical limitation of the search functionality exposed by the MISP/pymisp 2.4.120 API.) Instead if the existing MISP events have the same feed.provider and match closely, their timestamp will be updated.
If a new MISP event is inserted the significant_fields and the misp_additional_correlation_fields will be the attributes where correlation is enabled.
Make sure to build the IntelMQ Botnet in a way the rate of incoming events is what MISP can handle, as IntelMQ can process many more events faster than MISP (which is by design as MISP is for manual handling). Also remove the fields of the IntelMQ events with an expert bot that you do not want to be inserted into MISP.
Example (of some parameters in JSON):
"add_feed_provider_as_tag": true,
"add_feed_name_as_tag": true,
"misp_additional_correlation_fields": ["source.asn"],
"misp_additional_tags": ["OSINT", "osint:certainty=="90""],
"misp_publish": false,
"misp_to_ids_fields": ["source.fqdn", "source.reverse_dns"],
"significant_fields": ["source.fqdn", "source.reverse_dns"],
Originally developed with pymisp v2.4.120 (which needs python v>=3.6).
- intelmq.bots.outputs.misp.output_api.BOT¶
alias of
MISPAPIOutputBot
- class intelmq.bots.outputs.misp.output_api.MISPAPIOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Insert events into a MISP instance
IntelMQ-Bot-Name: MISP API
- _insert_misp_event(intelmq_event)¶
Insert a new MISPEvent.
- _update_misp_event(misp_event, intelmq_event)¶
Update timestamp on a found MISPEvent if it matches closely.
- add_feed_name_as_tag: bool = True¶
- add_feed_provider_as_tag: bool = True¶
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- init()¶
- misp_additional_correlation_fields = []¶
- misp_additional_tags = []¶
- misp_key: str = None¶
- misp_publish: bool = False¶
- misp_tag_for_bot: str = None¶
- misp_to_ids_fields = []¶
- misp_url: str = None¶
- process()¶
- significant_fields: list = []¶
- intelmq.bots.outputs.misp.output_feed.BOT¶
alias of
MISPFeedOutputBot
- class intelmq.bots.outputs.misp.output_feed.MISPFeedOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Generate an output in the MISP Feed format
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- static check_output_dir(dirname)¶
- init()¶
- interval_event: str = '1 hour'¶
- misp_org_name = None¶
- misp_org_uuid = None¶
- output_dir: str = '/opt/intelmq/var/lib/bots/mispfeed-output'¶
- process()¶
pymongo library automatically tries to reconnect if connection has been lost.
- intelmq.bots.outputs.mongodb.output.BOT¶
alias of
MongoDBOutputBot
- class intelmq.bots.outputs.mongodb.output.MongoDBOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send events to a MongoDB database
- collection = None¶
- connect()¶
- database = None¶
- db_pass = None¶
- db_user = None¶
- hierarchical_output: bool = False¶
- host: str = 'localhost'¶
- init()¶
- password = None¶
- port: int = 27017¶
- process()¶
- replacement_char = '_'¶
- shutdown()¶
- username = None¶
- intelmq.bots.outputs.redis.output.BOT¶
alias of
RedisOutputBot
- class intelmq.bots.outputs.redis.output.RedisOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send events to a Redis database
- connect()¶
- hierarchical_output = False¶
- init()¶
- process()¶
- redis_db: int = 2¶
- redis_password: str = None¶
- redis_queue: str = None¶
- redis_server_ip = '127.0.0.1'¶
- redis_server_port = 6379¶
- redis_timeout = 5000¶
- with_type: bool = True¶
- intelmq.bots.outputs.restapi.output.BOT¶
alias of
RestAPIOutputBot
- class intelmq.bots.outputs.restapi.output.RestAPIOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send events to a REST API listener through HTTP POST
- auth_token: str = None¶
- auth_token_name: str = None¶
- auth_type = None¶
- hierarchical_output: bool = False¶
- host: str = None¶
- init()¶
- process()¶
- use_json: bool = True¶
RPZ file output
SPDX-FileCopyrightText: 2021 Marius Karotkis <marius.karotkis@gmail.com> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.outputs.rpz_file.output.BOT¶
alias of
RpzFileOutputBot
- class intelmq.bots.outputs.rpz_file.output.RpzFileOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
- add_rpz_header()¶
- static check(parameters)¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- cname: str = ''¶
- dns_record_type: str = 'CNAME'¶
- encoding_errors_mode = 'strict'¶
- expire: int = 432000¶
- file: str = '/opt/intelmq/var/lib/bots/file-output/rpz'¶
- format_filename: bool = False¶
- generate_time: str = '2023-07-19 13:27:17'¶
- hierarchical_output: bool = False¶
- hostmaster_rpz_domain: str = ''¶
- init()¶
- keep_raw_field: bool = False¶
- message_jsondict_as_string: bool = False¶
- message_with_type: bool = False¶
- ncachttl: int = 60¶
- open_file(filename: Optional[str] = None)¶
- organization_name: str = ''¶
- process()¶
- refresh: int = 60¶
- retry: int = 60¶
- rpz_domain: str = ''¶
- rpz_email: str = ''¶
- serial: str = '2307191327'¶
- set_rpz_header()¶
- shutdown()¶
- single_key: bool = False¶
- test_domain: str = ''¶
- ttl: int = 3600¶
Request Tracker output bot
Creates a ticket in the specified queue Parameters: rt_uri, rt_user, rt_password, verify_cert - RT API endpoint queue - ticket destination queue cf_mapping - mapping attributes-ticket CFs final_status - what is final status for the created ticket create_investigation - should we create Investigation ticket (in case of RTIR workflow) fieldnames - attributes to include into investigation ticket description_attr - which event attribute contains text message being sent to the recipient
- intelmq.bots.outputs.rt.output.BOT¶
alias of
RTOutputBot
- class intelmq.bots.outputs.rt.output.RTOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Request Tracker ticket creation bot. Create linked Investigation queue ticket if needed, according to the RTIR flow
- cf_mapping = {'classification.taxonomy': 'Classification', 'classification.type': 'Incident Type', 'event_description.text': 'Description', 'extra.incident.importance': 'Importance', 'extra.incident.severity': 'Incident Severity', 'extra.organization.name': 'Customer', 'source.ip': 'IP'}¶
- create_investigation: bool = False¶
- description_attr: str = 'event_description.text'¶
- final_status: str = 'resolved'¶
- init()¶
- investigation_fields: str = 'time.source,time.observation,source.ip,source.port,source.fqdn,source.url,classification.taxonomy,classification.type,classification.identifier,event_description.url,event_description.text,malware.name,protocol.application,protocol.transport'¶
- process()¶
- queue: str = 'Incidents'¶
- rt_password: str = None¶
- rt_uri: str = 'http://localhost/REST/1.0'¶
- rt_user: str = 'apiuser'¶
- verify_cert: bool = True¶
- intelmq.bots.outputs.smtp.output.BOT¶
alias of
SMTPOutputBot
- class intelmq.bots.outputs.smtp.output.SMTPOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send single events as CSV attachment in dynamically formatted e-mails via SMTP
- fieldnames: str = 'classification.taxonomy,classification.type,classification.identifier,source.ip,source.asn,source.port'¶
- http_verify_cert: Union[bool, str] = True¶
- init()¶
- mail_from: str = 'cert@localhost'¶
- mail_to: str = '{ev[source.abuse_contact]}'¶
- process()¶
- smtp_host: str = 'localhost'¶
- smtp_password: Optional[str] = None¶
- smtp_port: int = 25¶
- smtp_username: Optional[str] = None¶
- ssl: bool = False¶
- starttls: bool = True¶
- subject: str = 'Incident in your AS {ev[source.asn]}'¶
- text: str = 'Dear network owner,\\n\\nWe have been informed that the following device might have security problems.\\n\\nYour localhost CERT'¶
SQL output bot.
See bot sql bot documentation for installation and configuration.
In case of errors, the bot tries to reconnect if the error is of operational and thus temporary. We don’t want to catch too much, like programming errors (missing fields etc).
- intelmq.bots.outputs.sql.output.BOT¶
alias of
SQLOutputBot
- class intelmq.bots.outputs.sql.output.SQLOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
-
Send events to a PostgreSQL or SQLite database
- autocommit = True¶
- database = 'intelmq-events'¶
- engine = None¶
- fields = None¶
- host = 'localhost'¶
- init()¶
- jsondict_as_string: bool = True¶
- password = None¶
- port = '5432'¶
- prepare_values(values)¶
- process()¶
- sslmode = 'require'¶
- table = 'events'¶
- user = 'intelmq'¶
- intelmq.bots.outputs.sql.output.itemgetter_tuple(*items)¶
- intelmq.bots.outputs.stomp.output.BOT¶
alias of
StompOutputBot
- class intelmq.bots.outputs.stomp.output.StompOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send events to a STMOP server
- connect()¶
- exchange: str = '/exchange/_push'¶
- heartbeat: int = 60000¶
- http_verify_cert: Union[bool, str] = True¶
- init()¶
- keep_raw_field: bool = False¶
- message_hierarchical_output: bool = False¶
- message_jsondict_as_string: bool = False¶
- message_with_type: bool = False¶
- port: int = 61614¶
- process()¶
- server: str = '127.0.0.1'¶
- shutdown()¶
- single_key: bool = False¶
- ssl_ca_certificate: str = 'ca.pem'¶
- ssl_client_certificate: str = 'client.pem'¶
- ssl_client_certificate_key: str = 'client.key'¶
For intelmq collectors on the other side we can expect the “ok” sent back. Otherwise, for filebeat and other we can’t do that. As this was the previous behavior, that’s the default. https://github.com/certtools/intelmq/issues/1385
- intelmq.bots.outputs.tcp.output.BOT¶
alias of
TCPOutputBot
- class intelmq.bots.outputs.tcp.output.TCPOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send events to a TCP server as Splunk, ElasticSearch or another IntelMQ etc
- connect()¶
- counterpart_is_intelmq: bool = True¶
- hierarchical_output: bool = False¶
- init()¶
- ip: str = None¶
- port: int = None¶
- process()¶
- recvall(conn, n)¶
- separator: str = None¶
Templated SMTP output bot
SPDX-FileCopyrightText: 2021 Linköping University <https://liu.se/> SPDX-License-Identifier: AGPL-3.0-or-later
Sends a MIME Multipart message built from an event and static text using Jinja2 templates.
Templates are in Jinja2 format with the event provided in the variable “event”. E.g.:
mail_to: “{{ event[‘source.abuse_contact’] }}”
See the Jinja2 documentation at https://jinja.palletsprojects.com/ .
As an extension to the Jinja2 environment, the function “from_json” is available for parsing JSON strings into Python structures. This is useful if you want to handle complicated structures in the “output” field of an event. In that case, you would start your template with a line like:
{%- set output = from_json(event[‘output’]) %}
and can then use “output” as a regular Python object in the rest of the template.
Attachments are template strings, especially useful for sending structured data. E.g. to send a JSON document including “malware.name” and all other fields starting with “source.”:
- attachments:
content-type: application/json text: |
- {
“malware”: “{{ event[‘malware.name’] }}”, {%- set comma = joiner(”, “) %} {%- for key in event %}
{%- if key.startswith(‘source.’) %}
- {{ comma() }}”{{ key }}”: “{{ event[key] }}”
{%- endif %}
{%- endfor %}
}
name: report.json
You are responsible for making sure that the text produced by the template is valid according to the content-type.
SMTP authentication is attempted if both “smtp_username” and “smtp_password” are provided.
Parameters:
- attachments: list of objects with structure:
content-type: string, templated, content-type to use. text: string, templated, attachment text. name: string, templated, filename of attachment.
- body: string, optional, default see below, templated, body text.
The default body template prints every field in the event except ‘raw’, in undefined order, one field per line, as “field: value”.
mail_from: string, templated, sender address.
mail_to: string, templated, recipient addresses, comma-separated.
- smtp_host: string, optional, default “localhost”, hostname of SMTP
server.
- smtp_password: string, default null, password (if any) for
authenticated SMTP.
smtp_port: integer, default 25, TCP port to connect to.
- smtp_username: string, default null, username (if any) for
authenticated SMTP.
- tls: boolean, default false, whether to use use SMTPS. If true, also
set smtp_port to the SMTPS port.
- starttls: boolean, default true, whether to use opportunistic STARTTLS
over SMTP.
- subject: string, optional, default “IntelMQ event”, templated, e-mail
subject line.
- verify_cert: boolean, default true, whether to verify the server
certificate in STARTTLS or SMTPS.
- intelmq.bots.outputs.templated_smtp.output.BOT¶
alias of
TemplatedSMTPOutputBot
- class intelmq.bots.outputs.templated_smtp.output.TemplatedSMTPOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
- attachments: List[str] = []¶
- body: str = "{%- for field in event %}\n {%- if field != 'raw' %}\n{{ field }}: {{ event[field] }}\n {%- endif %}\n{%- endfor %}\n"¶
- init()¶
- mail_from: Optional[str] = None¶
- mail_to: Optional[str] = None¶
- password: Optional[str] = None¶
- process()¶
- smtp_host: str = 'localhost'¶
- smtp_port: int = 25¶
- ssl: bool = False¶
- starttls: bool = False¶
- subject: str = 'IntelMQ event'¶
- username: Optional[str] = None¶
- verify_cert: bool = True¶
Using pathlib.Path.touch(path) and os.utime(path) did not work - the ctime did not change in some cases.
- intelmq.bots.outputs.touch.output.BOT¶
alias of
TouchOutputBot
- intelmq.bots.outputs.udp.output.BOT¶
alias of
UDPOutputBot
- class intelmq.bots.outputs.udp.output.UDPOutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
OutputBot
Send events to a UDP server, e.g. a syslog daemon
- delimited(event)¶
- field_delimiter: str = '|'¶
- format: str = None¶
- header: str = '<header text>'¶
- init()¶
- keep_raw_field: bool = False¶
- process()¶
- remove_control_char(s)¶
- send(rawdata)¶
- udp_host: str = 'localhost'¶
- udp_port: int = None¶
- class intelmq.bots.parsers.abusech.parser_feodotracker.AbusechFeodoTrackerParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the Abuse.ch Feodo Tracker feed (json)
List of source fields: [
‘ip_address’, ‘port’, ‘status’, ‘hostname’, ‘as_number’, ‘as_name’, ‘country’, ‘first_seen’, ‘last_online’, ‘malware’
]
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
- intelmq.bots.parsers.abusech.parser_feodotracker.BOT¶
alias of
AbusechFeodoTrackerParserBot
- class intelmq.bots.parsers.alienvault.parser.AlienVaultParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse data from the AlienVault API
- parse_line(row, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- intelmq.bots.parsers.alienvault.parser.BOT¶
alias of
AlienVaultParserBot
Events are gathered based on user subscriptions in AlienVault OTX The data structure is described in detail here: https://github.com/AlienVault-Labs/OTX-Python-SDK/blob/master/ howto_use_python_otx_api.ipynb
- class intelmq.bots.parsers.alienvault.parser_otx.AlienVaultOTXParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse data from the AlienVault OTX API
- parse_line(pulse, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
- intelmq.bots.parsers.alienvault.parser_otx.BOT¶
alias of
AlienVaultOTXParserBot
AnubisNetworks Cyberfeed Stream parser
TODO: Refactor with JSON mapping
There is an old format and a new one - distinguishable by the test cases
Migration to ParserBot does not make sense, as there’s only one event per report anyway
- class intelmq.bots.parsers.anubisnetworks.parser.AnubisNetworksParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse single JSON-events from AnubisNetworks Cyberfeed stream
- event_add_fallback(event, key, value)¶
- init()¶
- parse_geo(event, value, namespace, raw_report, orig_name)¶
- process()¶
- use_malware_familiy_as_classification_identifier = True¶
- intelmq.bots.parsers.anubisnetworks.parser.BOT¶
alias of
AnubisNetworksParserBot
IntelMQ parser for Bambenek DGA, Domain, and IP feeds
- intelmq.bots.parsers.bambenek.parser.BOT¶
alias of
BambenekParserBot
- class intelmq.bots.parsers.bambenek.parser.BambenekParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Single parser for Bambenek feeds
- DGA_FEED = {'http://osint.bambenekconsulting.com/feeds/dga-feed.txt', 'https://faf.bambenekconsulting.com/feeds/dga-feed.txt', 'https://osint.bambenekconsulting.com/feeds/dga-feed.txt'}¶
- DOMMASTERLIST = {'http://osint.bambenekconsulting.com/feeds/c2-dommasterlist.txt', 'https://faf.bambenekconsulting.com/feeds/dga/c2-dommasterlist.txt', 'https://osint.bambenekconsulting.com/feeds/c2-dommasterlist.txt'}¶
- IPMASTERLIST = {'http://osint.bambenekconsulting.com/feeds/c2-ipmasterlist.txt', 'https://faf.bambenekconsulting.com/feeds/dga/c2-ipmasterlist.txt', 'https://osint.bambenekconsulting.com/feeds/c2-ipmasterlist.txt'}¶
- MALWARE_NAME_MAP = {'cl': 'cryptolocker', 'p2pgoz': 'p2p goz', 'ptgoz': 'pt goz', 'volatile': 'volatile cedar'}¶
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- intelmq.bots.parsers.blocklistde.parser.BOT¶
alias of
BlockListDEParserBot
- class intelmq.bots.parsers.blocklistde.parser.BlockListDEParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the Blocklist.DE feeds
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- intelmq.bots.parsers.blueliv.parser_crimeserver.BOT¶
alias of
BluelivCrimeserverParserBot
A bot to parse certstream data. @author: Christoph Giese (Telekom Security, CDR)
- intelmq.bots.parsers.calidog.parser_certstream.BOT¶
alias of
CertStreamParserBot
- class intelmq.bots.parsers.calidog.parser_certstream.CertStreamParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the CertStream feed
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line)¶
Reverse of “parse” for single lines.
Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.
- Parameters
line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.
- Raises
ValueError – If neither the parameter “line” nor the member “self._current_line” is available.
- Returns
- str
The reconstructed raw data.
CERT-EU parser
“city”, # empty “source location”, # just a combination of long and lat “country”, # empty “as name”, # empty
reported cc, reported as name: ignored intentionally
- intelmq.bots.parsers.cert_eu.parser_csv.BOT¶
alias of
CertEUCSVParserBot
- class intelmq.bots.parsers.cert_eu.parser_csv.CertEUCSVParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse CSV data of the CERT-EU feed
- ABUSE_TO_INTELMQ = {'backdoor': 'system-compromise', 'blacklist': 'blacklist', 'botnet drone': 'infected-system', 'brute-force': 'brute-force', 'c2server': 'c2-server', 'compromised server': 'system-compromise', 'ddos infrastructure': 'ddos', 'ddos target': 'ddos', 'defacement': 'unauthorised-information-modification', 'dropzone': 'other', 'exploit url': 'exploit', 'ids alert': 'ids-alert', 'malware url': 'malware-distribution', 'malware-configuration': 'malware-configuration', 'phishing': 'phishing', 'ransomware': 'infected-system', 'scanner': 'scanner', 'spam infrastructure': 'spam', 'test': 'test', 'vulnerable service': 'vulnerable-system'}¶
- parse(report: Report)¶
A basic CSV Dictionary parser. The resulting lines are dictionaries with the column names as keys.
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: Optional[Union[dict, str]] = None) str ¶
Converts dictionaries to csv. self.csv_fieldnames must be list of fields. Respect saved line ending.
- intelmq.bots.parsers.ci_army.parser.BOT¶
alias of
CIArmyParserBot
- intelmq.bots.parsers.cleanmx.parser.BOT¶
alias of
CleanMXParserBot
- class intelmq.bots.parsers.cleanmx.parser.CleanMXParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the CleanMX feeds
- get_mapping_and_type(url)¶
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(entry_str, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- intelmq.bots.parsers.cymru.parser_cap_program.BOT¶
alias of
CymruCAPProgramParserBot
- class intelmq.bots.parsers.cymru.parser_cap_program.CymruCAPProgramParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the Cymru CAP Program feed
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_bot_old(comment_split, report_type, event)¶
- parse_line_new(line, report)¶
The format is two following: category|address|asn|timestamp|optional_information|asninfo Therefore very similar to CSV, just with the pipe as separator category: the type (resulting in classification.*) and optional_information needs to be parsed differently per category address: source.ip asn: source.asn timestamp: time.source optional_information: needs special care.
For some categories it needs parsing, as it contains a mapping of keys to values, whereas the meaning of the keys can differ between the categories For categories in MAPING_COMMENT, this field only contains one value. For the category ‘bruteforce’ both situations apply. Previously, the bruteforce events only had the protocol in the comment, while most other categories had a mapping. Now, the bruteforce categories also uses the type-value syntax. So we need to support both formats, the old and the new. See also https://github.com/certtools/intelmq/issues/1794
asninfo: source.as_name
- parse_line_old(line, report)¶
- intelmq.bots.parsers.cymru.parser_full_bogons.BOT¶
alias of
CymruFullBogonsParserBot
- class intelmq.bots.parsers.cymru.parser_full_bogons.CymruFullBogonsParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the Cymru Full Bogons feed
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(val: str, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- intelmq.bots.parsers.cznic.parser_haas.BOT¶
alias of
CZNICHaasParserBot
- class intelmq.bots.parsers.cznic.parser_haas.CZNICHaasParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
CZ.NIC HaaS Parser is the bot responsible to parse the report and sanitize the information
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
- intelmq.bots.parsers.cznic.parser_proki.BOT¶
alias of
CZNICProkiParserBot
- class intelmq.bots.parsers.cznic.parser_proki.CZNICProkiParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the feed from malicious IP addresses on Czech networks
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
- intelmq.bots.parsers.danger_rulez.parser.BOT¶
alias of
BruteForceBlockerParserBot
IntelMQ Dataplane Parser
- intelmq.bots.parsers.dataplane.parser.BOT¶
alias of
DataplaneParserBot
- class intelmq.bots.parsers.dataplane.parser.DataplaneParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the Dataplane feeds
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
# created: Tue, 22 Dec 2015 12:19:03 +0000# # Source IP is 0 padded so each byte is three digits long # Reports: number of packets received # Targets: number of target IPs that reported packets from this source. # First Seen: First time we saw a packet from this source # Last Seen: Last time we saw a packet from this source # Updated: Last time the record was updated. # # IPs are removed if they have not been seen in 30 days. # # source IP <tab> Reports <tab> Targets <tab> First Seen <tab> Last Seen <tab> Updated <CR>
- intelmq.bots.parsers.dshield.parser_asn.BOT¶
alias of
DShieldASNParserBot
# primary URL: https://feeds.dshield.org/block.txt # PGP Sign.: https://feeds.dshield.org/block.txt.asc # # updated: Tue Dec 15 15:33:38 2015 UTC # # This list summarizes the top 20 attacking class C (/24) subnets # over the last three days. The number of ‘attacks’ indicates the # number of targets reporting scans from this subnet. # # Columns (tab delimited): # (1) start of netblock # (2) end of netblock # (3) subnet (/24 for class C) # (4) number of targets scanned # (5) name of Network # (6) Country # (7) contact email address
- intelmq.bots.parsers.dshield.parser_block.BOT¶
alias of
DshieldBlockParserBot
format: ponmocup-malware-IP ponmocup-malware-domain ponmocup-malware-URI-path ponmocup-htaccess-infected-domain
- intelmq.bots.parsers.dyn.parser.BOT¶
alias of
DynParserBot
- intelmq.bots.parsers.eset.parser.BOT¶
alias of
ESETParserBot
- class intelmq.bots.parsers.eset.parser.ESETParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse data collected from ESET’s TAXII API
- common_parse(event, line)¶
- static domains_parse(event, line)¶
- init()¶
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
- static urls_parse(event, line)¶
Fireeye Parser Bot Retrieves a base64 encoded JSON-String from raw and converts it into an event.
- intelmq.bots.parsers.fireeye.parser.BOT¶
alias of
FireeyeParserBot
The source provides a JSON file with a dictionary. The keys of this dict are identifiers and the values are lists of domains.
The first part of the identifiers, before the first underscore, can be treated as malware name. The feed provider committed to retain this schema.
An overview of all names can be found here: https://dgarchive.caad.fkie.fraunhofer.de/pcres
Generic CSV parser
Parameters: columns: string delimiter: string default_url_protocol: string skip_header: boolean type: string type_translation: string data_type: string
- intelmq.bots.parsers.generic.parser_csv.BOT¶
alias of
GenericCsvParserBot
- class intelmq.bots.parsers.generic.parser_csv.GenericCsvParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse generic CSV data. Ignoring lines starting with character #. URLs without protocol can be prefixed with a default value.
- column_regex_search: Optional[dict] = None¶
- columns: Union[str, Iterable] = None¶
- columns_required: Optional[dict] = None¶
- compose_fields: Optional[dict] = {}¶
- data_type: Optional[dict] = None¶
- default_url_protocol: str = 'http://'¶
- delimiter: str = ','¶
- filter_text = None¶
- filter_type = None¶
- init()¶
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(row: list, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: Optional[list] = None) str ¶
Recover csv line, respecting saved line ending.
- Parameter:
line: Optional line as list. If absent, the current line is used as string.
- skip_header: Union[bool, int] = False¶
- time_format = None¶
- type: Optional[str] = None¶
- type_translation = {}¶
Github IOC feeds’ parser
- intelmq.bots.parsers.github_feed.parser.BOT¶
alias of
GithubFeedParserBot
- class intelmq.bots.parsers.github_feed.parser.GithubFeedParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse known GitHub feeds
- class StrangerealIntelDailyIOC(logger)¶
Bases:
object
- parse(event, json_content: dict)¶
Parse the specific feed to sufficient fields
- Parameters
event – output event object
json_content – IOC(s) in JSON format
- init()¶
- parse(report, json_content: dict)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- process()¶
- intelmq.bots.parsers.github_feed.parser.parse_domain_indicator(event, ioc_indicator: str)¶
- intelmq.bots.parsers.github_feed.parser.parse_hash_indicator(event, ioc_indicator: str, hash_type: str)¶
- intelmq.bots.parsers.github_feed.parser.parse_ip_indicator(event, ioc_indicator: str)¶
- intelmq.bots.parsers.github_feed.parser.parse_url_indicator(event, ioc_indicator: str)¶
There are two different Formats: Breaches and Pastes For Breaches, there are again two different Variants: * Callback Test: has field ‘Email’, Breach is a list of dictionaries * Real: has NO field ‘Email’, Breach is a dictionary
- intelmq.bots.parsers.hibp.parser_callback.BOT¶
alias of
HIBPCallbackParserBot
- class intelmq.bots.parsers.hibp.parser_callback.HIBPCallbackParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse reports of the ‘Have I Been Pwned’ Callback for Enterprise Subscribers
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(request, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line)¶
Reverse of “parse” for single lines.
Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.
- Parameters
line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.
- Raises
ValueError – If neither the parameter “line” nor the member “self._current_line” is available.
- Returns
- str
The reconstructed raw data.
HTML Table parser
Parameters: columns: string ignore_values: string skip_table_head: boolean attribute_name: string attribute_value: string table_index: int split_column: string split_separator: string split_index: int default_url_protocol: string time_format: string type: string
- intelmq.bots.parsers.html_table.parser.BOT¶
alias of
HTMLTableParserBot
- class intelmq.bots.parsers.html_table.parser.HTMLTableParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse HTML table data
- attribute_name = ''¶
- attribute_value = ''¶
- columns = ['', 'source.fqdn']¶
- default_url_protocol = 'http://'¶
- ignore_values = None¶
- init()¶
- process()¶
- skip_table_head = True¶
- split_column = ''¶
- split_index = 0¶
- split_separator = None¶
- table_index = 0¶
- time_format = None¶
- type = 'c2-server'¶
JSON Parser Bot Retrieves a base64 encoded JSON-String from raw and converts it into an event.
Copyright (C) 2016 by Bundesamt für Sicherheit in der Informationstechnik Software engineering by Intevation GmbH
- intelmq.bots.parsers.json.parser.BOT¶
alias of
JSONParserBot
Parse a string of key=value pairs.
SPDX-FileCopyrightText: 2020 Linköping University <https://liu.se/> SPDX-License-Identifier: AGPL-3.0-or-later
Tokens which do not contain the kv_separator string are ignored.
Values cannot contain newlines.
- param pair_separator
string, default ‘ ‘, string separating key=value pairs
- param kv_separator
string, default ‘=’, string separating key and value
- param keys
array of strings to strings, names of keys -> names of fields to propagate
- param strip_quotes
boolean, default true, remove opening and closing double quotes. Note that quotes do not protect pair separation, so e.g. key=”long value” will still be split into ‘key: “long’ and ‘value”’.
- param timestamp_key
string, optional, key containing event timestamp. Numerical values are interpreted as UNIX seconds, others are parsed by dateutil.parser.parse(fuzzy=True). If parsing fails no timestamp field will be added.
- intelmq.bots.parsers.key_value.parser.BOT¶
alias of
KeyValueParserBot
- class intelmq.bots.parsers.key_value.parser.KeyValueParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse key=value strings
- init()¶
- keys = {}¶
- kv_separator = '='¶
- pair_separator = ' '¶
- parse_line(row, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- strip_quotes = True¶
- timestamp_key = None¶
- intelmq.bots.parsers.malwarepatrol.parser_dansguardian.BOT¶
alias of
DansParserBot
- class intelmq.bots.parsers.malwarepatrol.parser_dansguardian.DansParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the MalwarePatrol Dans Guardian feed
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(row, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- sourcetime = None¶
- intelmq.bots.parsers.malwareurl.parser.BOT¶
alias of
MalwareurlParserBot
ATDParserBot parses McAfee Advanced Threat Defense reports. This bot generates one message per identified IOC: - hash values of original sample and any identified dropped file - IP addresses the sample tries to connect to - FQDNs the sample tries to connect to
Parameter: verdict_severity: defines the minimum severity of reports to be parsed severity ranges from 1 to 5
- class intelmq.bots.parsers.mcafee.parser_atd.ATDParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse IoCs from McAfee Advanced Threat Defense reports (hash, IP, URL)
- ATD_TYPE_MAPPING = {'Ipv4': 'destination.ip', 'Md5': 'malware.hash.md5', 'Name': 'malware.name', 'Port': 'destination.port', 'Sha1': 'malware.hash.sha1', 'Sha256': 'malware.hash.sha256', 'Url': 'destination.fqdn', 'domain': 'source.fqdn', 'hostname': 'source.fqdn'}¶
- process()¶
- verdict_severity: int = 4¶
- intelmq.bots.parsers.mcafee.parser_atd.BOT¶
alias of
ATDParserBot
Parses BingMURLs data in JSON format.
- intelmq.bots.parsers.microsoft.parser_bingmurls.BOT¶
alias of
MicrosoftBingMurlsParserBot
- class intelmq.bots.parsers.microsoft.parser_bingmurls.MicrosoftBingMurlsParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse JSON data from Microsoft’s Bing Malicious URLs list
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict)¶
Reverse of “parse” for single lines.
Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.
- Parameters
line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.
- Raises
ValueError – If neither the parameter “line” nor the member “self._current_line” is available.
- Returns
- str
The reconstructed raw data.
Parses CTIP data in JSON format.
Key indicatorexpirationdatetime is ignored, meaning is unknown.
There are two different variants of data
Interflow format: JSON format, MAPPING
Azure format: JSON stream format, a short example structure:
{ "DataFeed": "CTIP-Infected", "SourcedFrom": "SinkHoleMessage|SensorMessage"", "DateTimeReceivedUtc": nt time "DateTimeReceivedUtcTxt": human readable "Malware": "ThreatCode": "B67-SS-TINBA", "ThreatConfidence": "High|Medium|Low|Informational", -> 100/50/20/10 "TotalEncounters": 3, "TLP": "Amber", "SourceIp": "SourcePort": "DestinationIp": "DestinationPort": "TargetIp": Deprecated, so we gonne ignore it "TargetPort": Deprecated, so we gonne ignore it "SourceIpInfo": { "SourceIpAsnNumber": "SourceIpAsnOrgName": "SourceIpCountryCode": "SourceIpRegion": "SourceIpCity" "SourceIpPostalCode" "SourceIpLatitude" "SourceIpLongitude" "SourceIpMetroCode" "SourceIpAreaCode" "SourceIpConnectionType" }, "HttpInfo": { "HttpHost": "", "HttpRequest": "", "HttpMethod": "", "HttpReferrer": "", "HttpUserAgent": "", "HttpVersion": "" }, "CustomInfo": { "CustomField1": "", "CustomField2": "", "CustomField3": "", "CustomField4": "", "CustomField5": "" }, "Payload": base64 encoded json with meaningful dictionary keys or JSON-string with numbered dictionary keys }
- intelmq.bots.parsers.microsoft.parser_ctip.BOT¶
alias of
MicrosoftCTIPParserBot
- class intelmq.bots.parsers.microsoft.parser_ctip.MicrosoftCTIPParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse JSON data from Microsoft’s CTIP program
- overwrite: bool = True¶
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_azure(line, report)¶
- parse_interflow(line: dict, report)¶
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- intelmq.bots.parsers.misp.parser.BOT¶
alias of
MISPParserBot
- class intelmq.bots.parsers.misp.parser.MISPParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse MISP events
- MISP_TAXONOMY_MAPPING = {'ecsirt:abusive-content="spam"': 'spam', 'ecsirt:availability="ddos"': 'ddos', 'ecsirt:fraud="phishing"': 'phishing', 'ecsirt:information-content-security="dropzone"': 'other', 'ecsirt:information-gathering="scanner"': 'scanner', 'ecsirt:intrusion-attempts="brute-force"': 'brute-force', 'ecsirt:intrusion-attempts="exploit"': 'exploit', 'ecsirt:intrusion-attempts="ids-alert"': 'ids-alert', 'ecsirt:intrusions="backdoor"': 'system-compromise', 'ecsirt:intrusions="compromised"': 'system-compromise', 'ecsirt:intrusions="defacement"': 'unauthorised-information-modification', 'ecsirt:malicious-code="botnet-drone"': 'infected-system', 'ecsirt:malicious-code="c2server"': 'c2-server', 'ecsirt:malicious-code="malware"': 'infected-system', 'ecsirt:malicious-code="malware-configuration"': 'malware-configuration', 'ecsirt:malicious-code="ransomware"': 'infected-system', 'ecsirt:other="blacklist"': 'blacklist', 'ecsirt:other="unknown"': 'undetermined', 'ecsirt:test="test"': 'test', 'ecsirt:vulnerable="vulnerable-service"': 'vulnerable-system'}¶
- MISP_TYPE_MAPPING = {'domain': 'source.fqdn', 'email-src': 'source.account', 'hostname': 'source.fqdn', 'ip-dst': 'source.ip', 'ip-src': 'source.ip', 'md5': 'malware.hash.md5', 'sha1': 'malware.hash.sha1', 'url': 'source.url'}¶
- SUPPORTED_MISP_CATEGORIES = ['Payload delivery', 'Artifacts dropped', 'Payload installation', 'Network activity']¶
- process()¶
The source provides a JSON file with a dictionary. The keys of this dict are identifiers and the values are lists of domains.
IntelMQ parser for Netlab 360 data feeds.
- intelmq.bots.parsers.netlab_360.parser.BOT¶
alias of
Netlab360ParserBot
- class intelmq.bots.parsers.netlab_360.parser.Netlab360ParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the Netlab 360 DGA, Hajime, Magnitude and Mirai feeds
- DGA_FEED = {'http://data.netlab.360.com/feeds/dga/dga.txt', 'https://data.netlab.360.com/feeds/dga/dga.txt'}¶
- HAJIME_SCANNER_FEED = {'http://data.netlab.360.com/feeds/hajime-scanner/bot.list', 'https://data.netlab.360.com/feeds/hajime-scanner/bot.list'}¶
- MAGNITUDE_FEED = {'http://data.netlab.360.com/feeds/ek/magnitude.txt', 'https://data.netlab.360.com/feeds/ek/magnitude.txt'}¶
- MIRAI_SCANNER_FEED = {'http://data.netlab.360.com/feeds/mirai-scanner/scanner.list', 'https://data.netlab.360.com/feeds/mirai-scanner/scanner.list'}¶
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- intelmq.bots.parsers.openphish.parser.BOT¶
alias of
OpenPhishParserBot
- intelmq.bots.parsers.openphish.parser_commercial.BOT¶
alias of
OpenPhishCommercialParserBot
- class intelmq.bots.parsers.openphish.parser_commercial.OpenPhishCommercialParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the OpenPhish feed
List of source fields: [
‘asn’, ‘asn_name’, ‘brand’, ‘country_code’, ‘country_name’, ‘discover_time’, ‘emails’, ‘family_id’, ‘host’, ‘ip’, ‘isotime’, ‘page_language’, ‘phishing_kit’, ‘screenshot’, ‘sector’, ‘ssl_cert_issued_by’, ‘ssl_cert_issued_to’, ‘ssl_cert_serial’, ‘tld’, ‘url’,
]
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
- intelmq.bots.parsers.phishtank.parser.BOT¶
alias of
PhishTankParserBot
- class intelmq.bots.parsers.phishtank.parser.PhishTankParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the PhishTank feed (json) List of source fields: [
‘phish_id’, ‘url’, ‘phish_detail_url’, ‘submission_time’, ‘verified’, ‘verification_time’, ‘online’, ‘target’, ‘details’
]
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
Copyright (C) 2016 by Bundesamt für Sicherheit in der Informationstechnik Software engineering by Intevation GmbH
This is an “all-in-one” parser for a lot of shadowserver feeds. It depends on the configuration in the file “config.py” which holds information on how to treat certain shadowserverfeeds. It uses the report field extra.file_name to determine which config should apply, so this field is required.
This parser will only work with csv files named like 2019-01-01-scan_http-country-geo.csv.
- Optional parameters:
- overwrite: Bool, default False. If True, it keeps the report’s
feed.name and does not override it with the corresponding feed name.
feedname: The fixed feed name to use if it should not automatically detected.
- intelmq.bots.parsers.shadowserver.parser.BOT¶
alias of
ShadowserverParserBot
- class intelmq.bots.parsers.shadowserver.parser.ShadowserverParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse all ShadowServer feeds
- feedname = None¶
- init()¶
- overwrite = False¶
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(row, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: Optional[Union[dict, str]] = None) str ¶
Converts dictionaries to csv. self.csv_fieldnames must be list of fields. Respect saved line ending.
Shadowserver JSON Parser
SPDX-FileCopyrightText: 2020 Intelmq Team <intelmq-team@cert.at> SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.bots.parsers.shadowserver.parser_json.BOT¶
alias of
ShadowserverJSONParserBot
- class intelmq.bots.parsers.shadowserver.parser_json.ShadowserverJSONParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse all Shadowserver feeds in JSON format (data coming from the reports API) Shadowserver JSON Parser
- Parameters
feedname (str) – The name of the feed
- feedname = None¶
- get_value_from_config(data, entry)¶
Given a specific config, get the value for that data based on the entry
- init()¶
- overwrite = True¶
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(line: Any, report: Report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
Shodan Stream Parser
Copyright (C) 2018 by nic.at GmbH
- intelmq.bots.parsers.shodan.parser.BOT¶
alias of
ShodanParserBot
- exception intelmq.bots.parsers.shodan.parser.NoValueException(msg: Optional[str] = None)¶
Bases:
Exception
Raised in a conversion function in case the value cannot be used, e.g when trying to get the first item of an empty list
- msg: Optional[str]¶
- class intelmq.bots.parsers.shodan.parser.ShodanParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse Shodan data collected via the Shodan API
- apply_mapping(mapping: Dict[str, Any], data: Dict[str, Any], key_path: Tuple[str, ...] = ()) Dict[str, Any] ¶
- ignore_errors = True¶
- minimal_mode = False¶
- process() None ¶
- intelmq.bots.parsers.shodan.parser._dict_dict_to_obj_list(x: Dict[str, Dict[str, Any]], identifier: str = 'identifier') List[Dict[str, Any]] ¶
convert e.g {‘OuterKey1’: {‘InnerKey1’: ‘Value1’}, ‘OuterKey2’: {‘InnerKey2’: ‘Value2’}} to [{‘identifier’: ‘OuterKey1’, ‘InnerKey’: ‘Value1}, {‘identifier’: ‘OuterKey2’, ‘InnerKey’: ‘Value2’}}]
- intelmq.bots.parsers.shodan.parser._get_first(variable: List[Any]) Any ¶
get first element from list, if the list has any; raise NoValueException otherwise
- intelmq.bots.parsers.shodan.parser._get_first_fqdn(variable: List[str]) str ¶
get first valid FQDN from a list of strings
- intelmq.bots.parsers.shodan.parser._keys_conversion(x: Dict[str, Any]) List[str] ¶
extracts object keys to a list, for cases where the values they map to are empty/irrelevant
- intelmq.bots.parsers.shodan.parser._maybe_single_to_list(x: Any) List[Any] ¶
converts non-list objects to lists with a single item and leaves lists as-is, used to harmonize fields which avoid lists when a single value is given
Header of the File: ; Bots filtered by last 1 hours, prepared for <CERTNAME> on UTC = … ; Copyright © 2015 The Spamhaus Project Ltd. All rights reserved. ; No re-distribution or public access allowed without Spamhaus permission. ; Fields description: ; ; 1 - Infected IP ; 2 - ASN ; 3 - Country Code ; 4 - Lastseen Timestamp (in UTC) ; 5 - Bot Name ; Command & Control (C&C) information, if available: ; 6 - C&C Domain ; 7 - Remote IP (connecting to) ; 8 - Remote Port (connecting to) ; 9 - Local Port ; 10 - Protocol ; Additional fields may be added in the future without notice ; ; ip, asn, country, lastseen, botname, domain, remote_ip, remote_port, local_port, protocol
- class intelmq.bots.parsers.spamhaus.parser_cert.SpamhausCERTParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the Spamhaus CERT feed
- parse_line(row, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
Single IntelMQ parser for Spamhaus drop feeds
- intelmq.bots.parsers.spamhaus.parser_drop.BOT¶
alias of
SpamhausDropParserBot
- class intelmq.bots.parsers.spamhaus.parser_drop.SpamhausDropParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the Spamhaus DROP, EDROP, DROPv6, and ASN-DROP feeds
- ASN_DROP_URLS = {'https://www.spamhaus.org/drop/asndrop.txt'}¶
- NETWORK_DROP_URLS = {'https://www.spamhaus.org/drop/drop.lasso', 'https://www.spamhaus.org/drop/drop.txt', 'https://www.spamhaus.org/drop/dropv6.txt', 'https://www.spamhaus.org/drop/edrop.txt'}¶
- parse_line(line, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
Only parses hidden iframes and conditional redirections, not Encoded javascript.
- intelmq.bots.parsers.sucuri.parser.BOT¶
alias of
SucuriParserBot
- intelmq.bots.parsers.surbl.parser.BOT¶
alias of
SurblParserBot
- intelmq.bots.parsers.threatminer.parser.BOT¶
alias of
ThreatminerParserBot
- intelmq.bots.parsers.turris.parser.BOT¶
alias of
TurrisGreylistParserBot
Parser of text intended to obtain IOCs from tweets. First substitutions are performed and then words in the text are compared with ‘(/|^)([a-z0-9.-]+.[a-z0-9]+?)([/:]|$)’ In the case of a match it is checked whether this can be a valid domain using get_tld There is also a whitelist for filtering out good domains.
- param domain_whitelist
domains that will be ignored in parsing
- param substitutions
semicolon separated list of pairs substitutions that will be made in the text, for example ” .com,.com” enables parsing of one fuzzy format “[.];.” enables the parsing of another fuzzy format
- param classification_type
string with a valid classificationtype
- intelmq.bots.parsers.twitter.parser.BOT¶
alias of
TwitterParserBot
- class intelmq.bots.parsers.twitter.parser.TwitterParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse tweets and extract IoC data. Currently only URLs are supported, a whitelist of safe domains can be provided
- classification_type: str = 'blacklist'¶
- default_scheme: Optional[str] = None¶
- domain_whitelist: str = 't.co'¶
- get_data_from_text(text) list ¶
- get_domain(address)¶
- in_whitelist(domain: str) bool ¶
- init()¶
- process()¶
- substitutions: str = '.net;[.]net'¶
- intelmq.bots.parsers.vxvault.parser.BOT¶
alias of
VXVaultParserBot
- class intelmq.bots.parsers.vxvault.parser.VXVaultParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the VXVault feed
- parse(report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_line(row, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line)¶
Reverse of “parse” for single lines.
Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.
- Parameters
line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.
- Raises
ValueError – If neither the parameter “line” nor the member “self._current_line” is available.
- Returns
- str
The reconstructed raw data.
- intelmq.bots.parsers.webinspektor.parser.BOT¶
alias of
WebinspektorParserBot
ZoneH CSV defacement report parser
- intelmq.bots.parsers.zoneh.parser.BOT¶
alias of
ZoneHParserBot
- class intelmq.bots.parsers.zoneh.parser.ZoneHParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
ParserBot
Parse the ZoneH CSV feed
- parse(report: Report)¶
A basic CSV Dictionary parser. The resulting lines are dictionaries with the column names as keys.
- parse_line(row, report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- recover_line(line: Optional[str] = None) str ¶
Reverse of “parse” for single lines.
Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.
- Parameters
line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.
- Raises
ValueError – If neither the parameter “line” nor the member “self._current_line” is available.
- Returns
- str
The reconstructed raw data.
intelmq.lib package¶
CacheMixin for IntelMQ
SPDX-FileCopyrightText: 2021 Sebastian Waldbauer SPDX-License-Identifier: AGPL-3.0-or-later
CacheMixin is used for caching/storing data in redis.
- class intelmq.lib.mixins.cache.CacheMixin(**kwargs)¶
Bases:
object
- cache_exists(key: str)¶
- cache_flush()¶
Flushes the currently opened database by calling FLUSHDB.
- cache_get(key: str)¶
- cache_get_redis_instance()¶
- cache_set(key: str, value: Any, ttl: Optional[int] = None)¶
- redis_cache_db: int = 9¶
- redis_cache_host: str = '127.0.0.1'¶
- redis_cache_password: Optional[str] = None¶
- redis_cache_port: int = 6379¶
- redis_cache_ttl: int = 15¶
HttpMixin for IntelMQ
SPDX-FileCopyrightText: 2021 Birger Schacht SPDX-License-Identifier: AGPL-3.0-or-later
Based on create_request_session in intelmq.lib.utils and set_request_parameters in intelmq.lib.bot.Bot
- class intelmq.lib.mixins.http.HttpMixin(**kwargs)¶
Bases:
object
Setup a request session
- http_get(url: str, **kwargs) Response ¶
- http_header: dict = {}¶
- http_password = None¶
- http_proxy = None¶
- http_session() Session ¶
- http_timeout_max_tries: int = 3¶
- http_timeout_sec: int = 30¶
- http_user_agent: str = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'¶
- http_username = None¶
- http_verify_cert: bool = True¶
- https_proxy = None¶
- setup()¶
- ssl_client_cert = None¶
- class intelmq.lib.mixins.http.TimeoutHTTPAdapter(*args, timeout=None, **kwargs)¶
Bases:
HTTPAdapter
A requests-HTTP Adapter which can set the timeout generally.
- send(*args, **kwargs)¶
Sends PreparedRequest object. Returns Response object.
- Parameters
request – The
PreparedRequest
being sent.stream – (optional) Whether to stream the request content.
timeout (float or tuple or urllib3 Timeout object) – (optional) How long to wait for the server to send data before giving up, as a float, or a (connect timeout, read timeout) tuple.
verify – (optional) Either a boolean, in which case it controls whether we verify the server’s TLS certificate, or a string, in which case it must be a path to a CA bundle to use
cert – (optional) Any user-provided SSL certificate to be trusted.
proxies – (optional) The proxies dictionary to apply to the request.
- Return type
requests.Response
SQLMixin for IntelMQ
SPDX-FileCopyrightText: 2021 Birger Schacht, 2022 Intevation GmbH SPDX-License-Identifier: AGPL-3.0-or-later
Based on the former SQLBot base class
- class intelmq.lib.mixins.sql.SQLMixin(*args, **kwargs)¶
Bases:
object
Inherit this bot so that it handles DB connection for you. You do not have to bother: * connecting database in the self.init() method, just call super().init(), self.cur will be set * catching exceptions, just call self.execute() instead of self.cur.execute() * self.format_char will be set to ‘%s’ in PostgreSQL and to ‘?’ in SQLite
- MSSQL = 'mssql'¶
- POSTGRESQL = 'postgresql'¶
- SQLITE = 'sqlite'¶
- engine = None¶
- execute(query: str, values: tuple, rollback=False)¶
- message_jsondict_as_string = True¶
- reconnect_delay = 0¶
- class intelmq.lib.mixins.CacheMixin(**kwargs)¶
Bases:
object
- cache_exists(key: str)¶
- cache_flush()¶
Flushes the currently opened database by calling FLUSHDB.
- cache_get(key: str)¶
- cache_get_redis_instance()¶
- cache_set(key: str, value: Any, ttl: Optional[int] = None)¶
- redis_cache_db: int = 9¶
- redis_cache_host: str = '127.0.0.1'¶
- redis_cache_password: Optional[str] = None¶
- redis_cache_port: int = 6379¶
- redis_cache_ttl: int = 15¶
- class intelmq.lib.mixins.HttpMixin(**kwargs)¶
Bases:
object
Setup a request session
- http_get(url: str, **kwargs) Response ¶
- http_header: dict = {}¶
- http_password = None¶
- http_proxy = None¶
- http_session() Session ¶
- http_timeout_max_tries: int = 3¶
- http_timeout_sec: int = 30¶
- http_user_agent: str = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'¶
- http_username = None¶
- http_verify_cert: bool = True¶
- https_proxy = None¶
- setup()¶
- ssl_client_cert = None¶
- class intelmq.lib.mixins.SQLMixin(*args, **kwargs)¶
Bases:
object
Inherit this bot so that it handles DB connection for you. You do not have to bother: * connecting database in the self.init() method, just call super().init(), self.cur will be set * catching exceptions, just call self.execute() instead of self.cur.execute() * self.format_char will be set to ‘%s’ in PostgreSQL and to ‘?’ in SQLite
- MSSQL = 'mssql'¶
- POSTGRESQL = 'postgresql'¶
- SQLITE = 'sqlite'¶
- engine = None¶
- execute(query: str, values: tuple, rollback=False)¶
- message_jsondict_as_string = True¶
- reconnect_delay = 0¶
- The bot library has the base classes for all bots.
Bot: generic base class for all kind of bots
CollectorBot: base class for collectors
ParserBot: base class for parsers
- class intelmq.lib.bot.Bot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
object
Not to be reset when initialized again on reload.
- classmethod _create_argparser()¶
see https://github.com/certtools/intelmq/pull/1524/files#r464606370 why this code is not in the constructor
- _parse_common_parameters()¶
Parses and sanitizes commonly used parameters:
extract_files
- _parse_extract_file_parameter(parameter_name: str = 'extract_files')¶
Parses and sanitizes commonly used parameters:
extract_files
- accuracy: int = 100¶
- acknowledge_message()¶
Acknowledges that the last message has been processed, if any.
For bots without source pipeline (collectors), this is a no-op.
- static check(parameters: dict) Optional[List[List[str]]] ¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters
parameters – Bot’s parameters, defaults and runtime merged together
- Returns
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type
output
- description: Optional[str] = None¶
- destination_pipeline_broker: str = 'redis'¶
- destination_pipeline_db: int = 2¶
- destination_pipeline_host: str = '127.0.0.1'¶
- destination_pipeline_password: Optional[str] = None¶
- destination_pipeline_port: int = 6379¶
- destination_queues: dict = {}¶
- enabled: bool = True¶
- error_dump_message: bool = True¶
- error_log_exception: bool = True¶
- error_log_message: bool = False¶
- error_max_retries: int = 3¶
- error_procedure: str = 'pass'¶
- error_retry_delay: int = 15¶
- group: Optional[str] = None¶
- property harmonization¶
- http_proxy: Optional[str] = None¶
- http_timeout_max_tries: int = 3¶
- http_timeout_sec: int = 30¶
- http_user_agent: str = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'¶
- http_verify_cert: Union[bool, str] = True¶
- https_proxy: Optional[str] = None¶
- init()¶
- instances_threads: int = 0¶
- is_multithreaded: bool = False¶
- load_balance: bool = False¶
- log_processed_messages_count: int = 500¶
- log_processed_messages_seconds: int = 900¶
- logging_handler: str = 'file'¶
- logging_level: str = 'INFO'¶
- logging_path: str = '/opt/intelmq/var/log/'¶
- logging_syslog: str = '/dev/log'¶
- module = None¶
- name: Optional[str] = None¶
- new_event(*args, **kwargs)¶
- process_manager: str = 'intelmq'¶
- rate_limit: int = 0¶
- receive_message() Message ¶
If the bot is reloaded when waiting for an incoming message, the received message will be rejected to the pipeline in the first place to get to a clean state. Then, after reloading, the message will be retrieved again.
- classmethod run(parsed_args=None)¶
- run_mode: str = 'continuous'¶
- send_message(*messages, path: str = '_default', auto_add=None, path_permissive: bool = False)¶
- Parameters
messages – Instances of intelmq.lib.message.Message class
auto_add – ignored
path_permissive – If true, do not raise an error if the path is not configured
- set_request_parameters()¶
- shutdown()¶
- source_pipeline_broker: str = 'redis'¶
- source_pipeline_db: int = 2¶
- source_pipeline_host: str = '127.0.0.1'¶
- source_pipeline_password: Optional[str] = None¶
- source_pipeline_port: int = 6379¶
- source_queue: Optional[str] = None¶
- ssl_ca_certificate: Optional[str] = None¶
- start(starting: bool = True, error_on_pipeline: bool = True, error_on_message: bool = False, source_pipeline: Optional[str] = None, destination_pipeline: Optional[str] = None)¶
- statistics_database: int = 3¶
- statistics_host: str = '127.0.0.1'¶
- statistics_password: Optional[str] = None¶
- statistics_port: int = 6379¶
- stop(exitcode: int = 1)¶
- class intelmq.lib.bot.CollectorBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
Bot
Base class for collectors.
Does some sanity checks on message sending.
- accuracy: int = 100¶
- bottype = 'Collector'¶
- code: Optional[str] = None¶
- documentation: Optional[str] = None¶
- name: Optional[str] = None¶
- new_report()¶
- provider: Optional[str] = None¶
- send_message(*messages, path: str = '_default', auto_add: bool = True)¶
” :param messages: Instances of intelmq.lib.message.Message class :param path: Named queue the message will be send to :param auto_add: Add some default report fields form parameters
- class intelmq.lib.bot.ExpertBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
Bot
Base class for expert bots.
- bottype = 'Expert'¶
- class intelmq.lib.bot.OutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
Bot
Base class for outputs.
- bottype = 'Output'¶
- export_event(event: Event, return_type: Optional[type] = None) Union[str, dict] ¶
- exports an event according to the following parameters:
message_hierarchical
message_with_type
message_jsondict_as_string
single_key
keep_raw_field
- Parameters
return_type – Ensure that the returned value is of the given type. Optional. For example: str If the resulting value is not an instance of this type, the given object is called with the value as parameter E.g. str(retval)
- class intelmq.lib.bot.ParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)¶
Bases:
Bot
- _get_io_and_save_line_ending(raw: str) StringIO ¶
Prepare StringIO and save the original line ending
The line ending is saved in self._line_ending. The default value is rn, the same as default used by csv module
- bottype = 'Parser'¶
- default_fields: Optional[dict] = {}¶
- parse(report: Report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_csv_dict(report: Report)¶
A basic CSV Dictionary parser. The resulting lines are dictionaries with the column names as keys.
- parse_line(line: Any, report: Report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- process()¶
- recover_line(line: Optional[str] = None) str ¶
Reverse of “parse” for single lines.
Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.
- Parameters
line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.
- Raises
ValueError – If neither the parameter “line” nor the member “self._current_line” is available.
- Returns
- str
The reconstructed raw data.
- recover_line_csv(line: Optional[list] = None) str ¶
Recover csv line, respecting saved line ending.
- Parameter:
line: Optional line as list. If absent, the current line is used as string.
- recover_line_csv_dict(line: Optional[Union[dict, str]] = None) str ¶
Converts dictionaries to csv. self.csv_fieldnames must be list of fields. Respect saved line ending.
- recover_line_json(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters
dict. (The line as) –
- Returns
The JSON-encoded line as string.
- Return type
str
- recover_line_json_stream(line: Optional[str] = None) str ¶
recover_line for JSON streams (one JSON element per line, no outer structure), just returns the current line, unparsed.
- Parameters
line – The line itself as dict, if available, falls back to original current line
- Returns
unparsed JSON line.
- Return type
str
Utilities for debugging intelmq bots.
BotDebugger is called via intelmqctl. It starts a live running bot instance, leverages logging to DEBUG level and permits even a non-skilled programmer who may find themselves puzzled with Python nuances and server deployment twists to see what’s happening in the bot and where’s the error.
- Depending on the subcommand received, the class either
starts the bot as is (default)
processes single message, either injected or from default pipeline (process subcommand)
reads the message from input pipeline or send a message to output pipeline (message subcommand)
- class intelmq.lib.bot_debugger.BotDebugger(runtime_configuration, bot_id, run_subcommand=None, console_type=None, message_kind=None, dryrun=None, msg=None, show=None, loglevel=None)¶
Bases:
object
- EXAMPLE = '\nThe message may look like:\n \'{"source.network": "178.72.192.0/18", "time.observation": "2017-05-12T05:23:06+00:00"}\' '¶
- arg2msg(msg)¶
- instance = None¶
- leverageLogger(level)¶
- load_configuration() dict ¶
Load JSON or YAML configuration file.
- Parameters
configuration_filepath – Path to file to load.
- Returns
Parsed configuration
- Return type
config
- Raises
ValueError – if file not found
- static load_configuration_patch(configuration_filepath: str, *args, **kwargs) dict ¶
Mock function for utils.load_configuration which ensures the logging level parameter is set to the value we want. If Runtime configuration is detected, the logging_level parameter is - inserted in all bot’s parameters. bot_id is not accessible here, hence we add it everywhere - inserted in the global parameters (ex-defaults). Maybe not everything is necessary, but we can make sure the logging_level is just everywhere where it might be relevant, also in the future.
- logging_level = None¶
- messageWizzard(msg)¶
- output = []¶
- outputappend(msg)¶
- static pprint(msg) str ¶
We can’t use standard pprint as JSON standard asks for double quotes.
- run() str ¶
Cache is a set with information already seen by the system. This provides a way, for example, to remove duplicated events and reports in system or cache some results from experts like Cymru Whois. It’s possible to define a TTL value in each information inserted in cache. This TTL means how much time the system will keep an information in the cache.
- class intelmq.lib.datatypes.BotType(value)¶
Bases:
str
,Enum
An enumeration.
- COLLECTOR = 'Collector'¶
- EXPERT = 'Expert'¶
- OUTPUT = 'Output'¶
- PARSER = 'Parser'¶
- toJson()¶
IntelMQ Exception Class
- exception intelmq.lib.exceptions.ConfigurationError(config: str, argument: str)¶
Bases:
IntelMQException
- exception intelmq.lib.exceptions.IntelMQException(message)¶
Bases:
Exception
- exception intelmq.lib.exceptions.IntelMQHarmonizationException(message)¶
Bases:
IntelMQException
- exception intelmq.lib.exceptions.InvalidArgument(argument: Any, got: Optional[Any] = None, expected=None, docs: Optional[str] = None)¶
Bases:
IntelMQException
- exception intelmq.lib.exceptions.InvalidKey(key: str)¶
Bases:
IntelMQHarmonizationException
,KeyError
- exception intelmq.lib.exceptions.InvalidValue(key: str, value: str, reason: Optional[Any] = None, object: Optional[bytes] = None)¶
- exception intelmq.lib.exceptions.KeyExists(key: str)¶
- exception intelmq.lib.exceptions.KeyNotExists(key: str)¶
- exception intelmq.lib.exceptions.MissingDependencyError(dependency: str, version: Optional[str] = None, installed: Optional[str] = None, additional_text: Optional[str] = None)¶
Bases:
IntelMQException
A missing dependency was detected. Log instructions on installation.
- __init__(dependency: str, version: Optional[str] = None, installed: Optional[str] = None, additional_text: Optional[str] = None)¶
- Parameters
dependency (str) – The dependency name.
version (Optional[str], optional) – The required version. The default is None.
installed (Optional[str], optional) – The currently installed version. Requires ‘version’ to be given The default is None.
additional_text (Optional[str], optional) – Arbitrary additional text to show. The default is None.
- Returns
with prepared text
- Return type
- exception intelmq.lib.exceptions.PipelineError(argument: Union[str, Exception])¶
Bases:
IntelMQException
The following types are implemented with sanitize() and is_valid() functions:
Base64
Boolean
ClassificationTaxonomy
ClassificationType
DateTime
FQDN
Float
Accuracy
GenericType
IPAddress
IPNetwork
Integer
JSON
JSONDict
LowercaseString
Registry
String
URL
ASN
UppercaseString
TLP
- class intelmq.lib.harmonization.ASN¶
Bases:
Integer
ASN type. Derived from Integer with forbidden values.
Only valid are: 0 < asn <= 4294967295 See https://en.wikipedia.org/wiki/Autonomous_system_(Internet) > The first and last ASNs of the original 16-bit integers, namely 0 and > 65,535, and the last ASN of the 32-bit numbers, namely 4,294,967,295 are > reserved and should not be used by operators.
- static check_asn(value: int) bool ¶
- static is_valid(value: int, sanitize: bool = False) bool ¶
- static sanitize(value: int) Optional[int] ¶
- class intelmq.lib.harmonization.Accuracy¶
Bases:
Float
Accuracy type. A Float between 0 and 100.
- static is_valid(value: float, sanitize: bool = False) bool ¶
- static sanitize(value: float) Optional[float] ¶
- class intelmq.lib.harmonization.Base64¶
Bases:
String
Base64 type. Always gives unicode strings.
Sanitation encodes to base64 and accepts binary and unicode strings.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) Optional[str] ¶
- class intelmq.lib.harmonization.Boolean¶
Bases:
GenericType
Boolean type. Without sanitation only python bool is accepted.
Sanitation accepts string ‘true’ and ‘false’ and integers 0 and 1.
- static is_valid(value: bool, sanitize: bool = False) bool ¶
- static sanitize(value: bool) Optional[bool] ¶
- class intelmq.lib.harmonization.ClassificationTaxonomy¶
Bases:
String
classification.taxonomy type.
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/
- These old values are automatically mapped to the new ones:
‘abusive content’ -> ‘abusive-content’ ‘information gathering’ -> ‘information-gathering’ ‘intrusion attempts’ -> ‘intrusion-attempts’ ‘malicious code’ -> ‘malicious-code’
- Allowed values are:
abusive-content
availability
fraud
information-content-security
information-gathering
intrusion-attempts
intrusions
malicious-code
other
test
vulnerable
- allowed_values = ['abusive-content', 'availability', 'fraud', 'information-content-security', 'information-gathering', 'intrusion-attempts', 'intrusions', 'malicious-code', 'other', 'test', 'vulnerable']¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) Optional[str] ¶
- class intelmq.lib.harmonization.ClassificationType¶
Bases:
String
classification.type type.
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/ with extensions.
- These old values are automatically mapped to the new ones:
‘botnet drone’ -> ‘infected-system’ ‘ids alert’ -> ‘ids-alert’ ‘c&c’ -> ‘c2-server’ ‘c2server’ -> ‘c2-server’ ‘infected system’ -> ‘infected-system’ ‘malware configuration’ -> ‘malware-configuration’ ‘Unauthorised-information-access’ -> ‘unauthorised-information-access’ ‘leak’ -> ‘data-leak’ ‘vulnerable client’ -> ‘vulnerable-system’ ‘vulnerable service’ -> ‘vulnerable-system’ ‘ransomware’ -> ‘infected-system’ ‘unknown’ -> ‘undetermined’
- These values changed their taxonomy:
- ‘malware’: In terms of the taxonomy ‘malicious-code’ they can be either ‘infected-system’ or ‘malware-distribution’
but in terms of malware actually, it is now taxonomy ‘other’
- Allowed values are:
application-compromise
blacklist
brute-force
burglary
c2-server
copyright
data-leak
data-loss
ddos
ddos-amplifier
dga-domain
dos
exploit
harmful-speech
ids-alert
infected-system
information-disclosure
malware
malware-configuration
malware-distribution
masquerade
misconfiguration
other
outage
phishing
potentially-unwanted-accessible
privileged-account-compromise
proxy
sabotage
scanner
sniffing
social-engineering
spam
system-compromise
test
tor
unauthorised-information-access
unauthorised-information-modification
unauthorized-use-of-resources
undetermined
unprivileged-account-compromise
violence
vulnerable-system
weak-crypto
- allowed_values = ('application-compromise', 'blacklist', 'brute-force', 'burglary', 'c2-server', 'copyright', 'data-leak', 'data-loss', 'ddos', 'ddos-amplifier', 'dga-domain', 'dos', 'exploit', 'harmful-speech', 'ids-alert', 'infected-system', 'information-disclosure', 'malware', 'malware-configuration', 'malware-distribution', 'masquerade', 'misconfiguration', 'other', 'outage', 'phishing', 'potentially-unwanted-accessible', 'privileged-account-compromise', 'proxy', 'sabotage', 'scanner', 'sniffing', 'social-engineering', 'spam', 'system-compromise', 'test', 'tor', 'unauthorised-information-access', 'unauthorised-information-modification', 'unauthorized-use-of-resources', 'undetermined', 'unprivileged-account-compromise', 'violence', 'vulnerable-system', 'weak-crypto')¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) Optional[str] ¶
- class intelmq.lib.harmonization.DateTime¶
Bases:
String
Date and time type for timestamps.
Valid values are timestamps with time zone and in the format ‘%Y-%m-%dT%H:%M:%S+00:00’. Invalid are missing times and missing timezone information (UTC). Microseconds are also allowed.
Sanitation normalizes the timezone to UTC, which is the only allowed timezone.
The following additional conversions are available with the convert function:
timestamp
windows_nt: From Windows NT / AD / LDAP
epoch_millis: From Milliseconds since Epoch
from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’
from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’
utc_isoformat: Parse date generated by datetime.isoformat()
fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given
- TIME_CONVERSIONS = {'timestamp': <function DateTime.from_timestamp>, 'windows_nt': <function DateTime.from_windows_nt>, 'epoch_millis': <function DateTime.from_epoch_millis>, 'from_format': <function DateTime.convert_from_format>, 'from_format_midnight': <function DateTime.convert_from_format_midnight>, 'utc_isoformat': <function DateTime.parse_utc_isoformat>, 'fuzzy': <function DateTime.convert_fuzzy>, None: <function DateTime.convert_fuzzy>}¶
- static convert(value, format='fuzzy') str ¶
Converts date time strings according to the given format. If the timezone is not given or clear, the local time zone is assumed!
timestamp
windows_nt: From Windows NT / AD / LDAP
epoch_millis: From Milliseconds since Epoch
from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’
from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’
utc_isoformat: Parse date generated by datetime.isoformat()
fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given
- static convert_from_format(value: str, format: str) str ¶
Converts a datetime with the given format.
- static convert_from_format_midnight(value: str, format: str) str ¶
Converts a date with the given format and adds time 00:00:00 to it.
- static convert_fuzzy(value) str ¶
- static from_epoch_millis(tstamp: Union[int, str]) str ¶
Returns ISO formatted datetime from given epoch timestamp with milliseconds. It ignores the milliseconds, converts it into normal timestamp and processes it.
- static from_timestamp(tstamp: Union[int, float, str]) str ¶
Returns ISO formatted datetime from given timestamp.
- static from_windows_nt(tstamp: int) str ¶
Converts the Windows NT / LDAP / Active Directory format to ISO format.
The format is: 100 nanoseconds (10^-7s) since 1601-01-01. UTC is assumed.
- Parameters
tstamp – Time in LDAP format as integer or string. Will be converted if necessary.
- Returns
Converted ISO format string
See also
- static generate_datetime_now() str ¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- midnight = datetime.time(0, 0)¶
- static parse_utc_isoformat(value: str, return_datetime: bool = False) Union[datetime, str] ¶
Parse format generated by datetime.isoformat() method with UTC timezone. It is much faster than universal dateutil parser. Can be used for parsing DateTime fields which are already parsed.
Returns a string with ISO format. If return_datetime is True, the return value is a datetime.datetime object.
- static sanitize(value: str) Optional[str] ¶
- class intelmq.lib.harmonization.FQDN¶
Bases:
String
Fully qualified domain name type.
All valid lowercase domains are accepted, no IP addresses or URLs. Trailing dot is not allowed.
To prevent values like ‘10.0.0.1:8080’ (#1235), we check for the non-existence of ‘:’.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) Optional[str] ¶
- static to_ip(value: str) Optional[str] ¶
- class intelmq.lib.harmonization.Float¶
Bases:
GenericType
Float type. Without sanitation only python float/integer/long is accepted. Boolean is explicitly denied.
Sanitation accepts strings and everything float() accepts.
- static is_valid(value: float, sanitize: bool = False) bool ¶
- static sanitize(value: float) Optional[float] ¶
- class intelmq.lib.harmonization.GenericType¶
Bases:
object
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value) Optional[str] ¶
- class intelmq.lib.harmonization.IPAddress¶
Bases:
String
Type for IP addresses, all families. Uses the ipaddress module.
Sanitation accepts integers, strings and objects of ipaddress.IPv4Address and ipaddress.IPv6Address.
Valid values are only strings. 0.0.0.0 is explicitly not allowed.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: Union[int, str]) Optional[str] ¶
- static to_int(value: str) Optional[int] ¶
- static to_reverse(ip_addr: str) str ¶
- static version(value: str) int ¶
- class intelmq.lib.harmonization.IPNetwork¶
Bases:
String
Type for IP networks, all families. Uses the ipaddress module.
Sanitation accepts strings and objects of ipaddress.IPv4Network and ipaddress.IPv6Network. If host bits in strings are set, they will be ignored (e.g 127.0.0.1/32).
Valid values are only strings.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) Optional[str] ¶
- static version(value: str) int ¶
- class intelmq.lib.harmonization.Integer¶
Bases:
GenericType
Integer type. Without sanitation only python integer/long is accepted. Bool is explicitly denied.
Sanitation accepts strings and everything int() accepts.
- static is_valid(value: int, sanitize: bool = False) bool ¶
- static sanitize(value: int) Optional[int] ¶
- class intelmq.lib.harmonization.JSON¶
Bases:
String
JSON type.
Sanitation accepts any valid JSON objects.
Valid values are only unicode strings with JSON objects.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) Optional[str] ¶
- class intelmq.lib.harmonization.JSONDict¶
Bases:
JSON
JSONDict type.
Sanitation accepts pythons dictionaries and JSON strings.
Valid values are only unicode strings with JSON dictionaries.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static is_valid_subitem(value: str) bool ¶
- static sanitize(value: str) Optional[str] ¶
- static sanitize_subitem(value: str) str ¶
- class intelmq.lib.harmonization.LowercaseString¶
Bases:
String
Like string, but only allows lower case characters.
Sanitation lowers all characters.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) Optional[bool] ¶
- class intelmq.lib.harmonization.Registry¶
Bases:
UppercaseString
Registry type. Derived from UppercaseString.
Only valid values: AFRINIC, APNIC, ARIN, LACNIC, RIPE. RIPE-NCC and RIPENCC are normalized to RIPE.
- ENUM = ['AFRINIC', 'APNIC', 'ARIN', 'LACNIC', 'RIPE']¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str ¶
- class intelmq.lib.harmonization.String¶
Bases:
GenericType
Any non-empty string without leading or trailing whitespace.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- class intelmq.lib.harmonization.TLP¶
Bases:
UppercaseString
TLP level type. Derived from UppercaseString.
Only valid values: WHITE, GREEN, AMBER, RED.
Accepted for sanitation are different cases and the prefix ‘tlp:’.
- enum = ['WHITE', 'GREEN', 'AMBER', 'RED']¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- prefix_pattern = re.compile('^(TLP:?)?\\s*')¶
- static sanitize(value: str) Optional[str] ¶
- class intelmq.lib.harmonization.URL¶
Bases:
String
URI type. Local and remote.
Sanitation converts hxxp and hxxps to http and https. For local URIs (file) a missing host is replaced by localhost.
Valid values must have the host (network location part).
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) Optional[str] ¶
- static to_domain_name(url: str) Optional[str] ¶
- static to_ip(url: str) Optional[str] ¶
Messages are the information packages in pipelines.
Use MessageFactory to get a Message object (types Report and Event).
- class intelmq.lib.message.Event(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None)¶
Bases:
Message
- __init__(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None) None ¶
- Parameters
message – Give a report and feed.name, feed.url and time.observation will be used to construct the Event if given. If it’s another type, the value is given to dict’s init
auto – unused here
harmonization – Harmonization definition to use
- class intelmq.lib.message.Message(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None)¶
Bases:
dict
- add(key: str, value: str, sanitize: bool = True, overwrite: Optional[bool] = None, ignore: Sequence = (), raise_failure: bool = True) Optional[bool] ¶
Add a value for the key (after sanitation).
- Parameters
key – Key as defined in the harmonization
value – A valid value as defined in the harmonization If the value is None or in _IGNORED_VALUES the value will be ignored. If the value is ignored, the key exists and overwrite is True, the key is deleted.
sanitize – Sanitation of harmonization type will be called before validation (default: True)
overwrite – Overwrite an existing value if it already exists (default: None) If True, overwrite an existing value If False, do not overwrite an existing value If None, raise intelmq.exceptions.KeyExists for an existing value
raise_failure – If a intelmq.lib.exceptions.InvalidValue should be raised for invalid values (default: True). If false, the return parameter will be False in case of invalid values.
- Returns
True if the value has been added.
- False if the value is invalid and raise_failure is False or the value existed
and has not been overwritten.
None if the value has been ignored.
- Raises
intelmq.lib.exceptions.KeyExists – If key exists and won’t be overwritten explicitly.
intelmq.lib.exceptions.InvalidKey – if key is invalid.
intelmq.lib.exceptions.InvalidArgument – if ignore is not list or tuple.
intelmq.lib.exceptions.InvalidValue – If value is not valid for the given key and raise_failure is True.
- change(key: str, value: str, sanitize: bool = True)¶
- copy() a shallow copy of D ¶
- deep_copy()¶
- finditems(keyword: str)¶
- get(key, default=None)¶
Return the value for key if key is in the dictionary, else default.
- hash(*, filter_keys: Iterable = frozenset({}), filter_type: str = 'blacklist')¶
Return a SHA256 hash of the message as a hexadecimal string. The hash is computed over almost all key/value pairs. Depending on filter_type parameter (blacklist or whitelist), the keys defined in filter_keys_list parameter will be considered as the keys to ignore or the only ones to consider. If given, the filter_keys_list parameter should be a set.
‘time.observation’ will always be ignored.
- is_valid(key: str, value: str, sanitize: bool = True) bool ¶
Checks if a value is valid for the key (after sanitation).
- Parameters
key – Key of the field
value – Value of the field
sanitize – Sanitation of harmonization type will be called before validation (default: True)
- Returns
True if the value is valid, otherwise False
- Raises
intelmq.lib.exceptions.InvalidKey – if given key is invalid.
- serialize()¶
- set_default_value(value: Optional[Any] = None)¶
Sets a default value for items.
- to_dict(hierarchical: bool = False, with_type: bool = False, jsondict_as_string: bool = False) dict ¶
Returns a copy of self, only based on a dict class.
- Parameters
hierarchical – Split all keys at a dot and save these subitems in dictionaries.
with_type – Add a value named __type containing the message type
jsondict_as_string – If False (default) treat values in JSONDict fields just as normal ones If True, save such fields as JSON-encoded string. This is the old behavior before version 1.1.
- Returns
- A dictionary as copy of itself modified according
to the given parameters
- Return type
new_dict
- to_json(hierarchical=False, with_type=False, jsondict_as_string=False)¶
- static unserialize(message_string: str)¶
- update([E, ]**F) None. Update D from dict/iterable E and F. ¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- class intelmq.lib.message.MessageFactory¶
Bases:
object
unserialize: JSON encoded message to object serialize: object to JSON encoded object
- static from_dict(message: dict, harmonization=None, default_type: Optional[str] = None) dict ¶
Takes dictionary Message object, returns instance of correct class.
- Parameters
message – the message which should be converted to a Message object
harmonization – a dictionary holding the used harmonization
default_type – If ‘__type’ is not present in message, the given type will be used
See also
MessageFactory.unserialize MessageFactory.serialize
- static serialize(message)¶
Takes instance of message-derived class and makes JSON-encoded Message.
The class is saved in __type attribute.
- static unserialize(raw_message: str, harmonization: Optional[dict] = None, default_type: Optional[str] = None) dict ¶
Takes JSON-encoded Message object, returns instance of correct class.
- Parameters
message – the message which should be converted to a Message object
harmonization – a dictionary holding the used harmonization
default_type – If ‘__type’ is not present in message, the given type will be used
See also
MessageFactory.from_dict MessageFactory.serialize
- class intelmq.lib.message.Report(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None)¶
Bases:
Message
- __init__(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None) None ¶
- Parameters
message – Passed along to Message’s and dict’s init. If this is an instance of the Event class, the resulting Report instance has only the fields which are possible in Report, all others are stripped.
auto – if False (default), time.observation is automatically added.
harmonization – Harmonization definition to use
- copy() a shallow copy of D ¶
- class intelmq.lib.pipeline.Amqp(logger, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)¶
Bases:
Pipeline
- check_connection()¶
- clear_queue(queue: str) bool ¶
- connect()¶
- count_queued_messages(*queues) dict ¶
- destination_pipeline_amqp_exchange = ''¶
- destination_pipeline_amqp_virtual_host = '/'¶
- destination_pipeline_db = 2¶
- destination_pipeline_host = '127.0.0.1'¶
- destination_pipeline_password = None¶
- destination_pipeline_socket_timeout = None¶
- destination_pipeline_ssl = False¶
- destination_pipeline_username = None¶
- disconnect()¶
- intelmqctl_rabbitmq_monitoring_url = None¶
- load_configurations(queues_type)¶
- nonempty_queues() set ¶
- queue_args = {'x-queue-mode': 'lazy'}¶
- send(message: str, path: str = '_default', path_permissive: bool = False)¶
In principle we could use AMQP’s exchanges here but that architecture is incompatible to the format of our pipeline configuration.
- set_queues(queues: dict, queues_type: str)¶
- Parameters
queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
- setup_channel()¶
- source_pipeline_amqp_exchange = ''¶
- source_pipeline_amqp_virtual_host = '/'¶
- source_pipeline_db = 2¶
- source_pipeline_host = '127.0.0.1'¶
- source_pipeline_password = None¶
- source_pipeline_socket_timeout = None¶
- source_pipeline_ssl = False¶
- source_pipeline_username = None¶
- class intelmq.lib.pipeline.Pipeline(logger, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)¶
Bases:
object
- acknowledge()¶
Acknowledge/delete the current message from the source queue
Parameters:
- Raises
exceptions – exceptions.PipelineError: If no message is held
- Returns
None
- clear_queue(queue)¶
- connect()¶
- disconnect()¶
- has_internal_queues = False¶
- nonempty_queues() set ¶
- receive() str ¶
- reject_message()¶
- send(message: str, path: str = '_default', path_permissive: bool = False)¶
- set_queues(queues: Optional[str], queues_type: str)¶
- Parameters
queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
- class intelmq.lib.pipeline.PipelineFactory¶
Bases:
object
- static create(logger, broker=None, direction=None, queues=None, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)¶
direction: “source” or “destination”, optional, needed for queues queues: needs direction to be set, calls set_queues bot: Bot instance
- class intelmq.lib.pipeline.Pythonlist(logger, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)¶
Bases:
Pipeline
This pipeline uses simple lists and is only for testing purpose.
It behaves in most ways like a normal pipeline would do, but works entirely without external modules and programs. Data is saved as it comes (no conversion) and it is not blocking.
- _acknowledge()¶
Removes a message from the internal queue and returns it
- _receive() bytes ¶
Receives the last not yet acknowledged message.
Does not block unlike the other pipelines.
- _reject_message()¶
No-op because of the internal queue
- clear_queue(queue)¶
Empties given queue.
- connect()¶
- count_queued_messages(*queues) dict ¶
Returns the amount of queued messages over all given queue names.
- disconnect()¶
- send(message: str, path: str = '_default', path_permissive: bool = False)¶
Sends a message to the destination queues
- set_queues(queues, queues_type)¶
- Parameters
queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
- state = {}¶
- class intelmq.lib.pipeline.Redis(logger, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)¶
Bases:
Pipeline
- _reject_message()¶
Rejecting is a no-op as the message is in the internal queue anyway.
- clear_queue(queue)¶
Clears a queue by removing (deleting) the key, which is the same as an empty list in Redis
- connect()¶
- count_queued_messages(*queues) dict ¶
- destination_pipeline_db = 2¶
- destination_pipeline_host = '127.0.0.1'¶
- destination_pipeline_password = None¶
- disconnect()¶
- has_internal_queues = True¶
- load_configurations(queues_type)¶
- nonempty_queues() set ¶
Returns a list of all currently non-empty queues.
- pipe = None¶
- send(message: str, path: str = '_default', path_permissive: bool = False)¶
- set_queues(queues, queues_type)¶
- Parameters
queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
- source_pipeline_db = 2¶
- source_pipeline_host = '127.0.0.1'¶
- source_pipeline_password = None¶
- class intelmq.lib.processmanager.IntelMQProcessManager(*args, **kwargs)¶
Bases:
ProcessManagerInterface
- PIDDIR = '/opt/intelmq/var/run/'¶
- PIDFILE = '/opt/intelmq/var/run/{}.pid'¶
- static _interpret_commandline(pid: int, cmdline: Iterable[str], module: str, bot_id: str) Union[bool, str] ¶
Separate function to allow easy testing
Parameters¶ - pidint
Process ID, used for return values (error messages) only.
- cmdlineIterable[str]
The command line of the process.
- modulestr
The module of the bot.
- bot_idstr
The ID of the bot.
Returns¶ - Union[bool, str]
DESCRIPTION.
- bot_reload(bot_id, getstatus=True)¶
- bot_run(bot_id, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
- bot_start(bot_id, getstatus=True)¶
- bot_status(bot_id, *, proc=None)¶
- bot_stop(bot_id, getstatus=True)¶
- class intelmq.lib.processmanager.ProcessManagerInterface(interactive: bool, runtime_configuration: dict, logger: Logger, returntype: ReturnType, quiet: bool)¶
Bases:
object
Defines an interface all processmanager must adhere to
- abstract bot_reload(bot_id: str, getstatus=True)¶
- abstract bot_run(bot_id: str, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
- abstract bot_start(bot_id: str, getstatus=True)¶
- abstract bot_status(bot_id: str) str ¶
- abstract bot_stop(bot_id: str, getstatus=True)¶
- class intelmq.lib.processmanager.SupervisorProcessManager(interactive: bool, runtime_configuration: dict, logger: Logger, returntype: ReturnType, quiet: bool)¶
Bases:
ProcessManagerInterface
- DEFAULT_SOCKET_PATH = '/var/run/supervisor.sock'¶
- class ProcessState¶
Bases:
object
- BACKOFF = 30¶
- EXITED = 100¶
- FATAL = 200¶
- RUNNING = 20¶
- STARTING = 10¶
- STOPPED = 0¶
- STOPPING = 40¶
- UNKNOWN = 1000¶
- static is_running(state: int) bool ¶
- class RpcFaults¶
Bases:
object
- ABNORMAL_TERMINATION = 40¶
- ALREADY_ADDED = 90¶
- ALREADY_STARTED = 60¶
- BAD_ARGUMENTS = 3¶
- BAD_NAME = 10¶
- BAD_SIGNAL = 11¶
- CANT_REREAD = 92¶
- FAILED = 30¶
- INCORRECT_PARAMETERS = 2¶
- NOT_EXECUTABLE = 21¶
- NOT_RUNNING = 70¶
- NO_FILE = 20¶
- SHUTDOWN_STATE = 6¶
- SIGNATURE_UNSUPPORTED = 4¶
- SPAWN_ERROR = 50¶
- STILL_RUNNING = 91¶
- SUCCESS = 80¶
- UNKNOWN_METHOD = 1¶
- SUPERVISOR_GROUP = 'intelmq'¶
- bot_reload(bot_id: str, getstatus: bool = True)¶
- bot_run(bot_id, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
- bot_start(bot_id: str, getstatus: bool = True)¶
- bot_status(bot_id: str) str ¶
- bot_stop(bot_id: str, getstatus: bool = True)¶
- intelmq.lib.processmanager.process_managers()¶
Create a list of processmanagers in this class that are implementing the ProcessManagerInterface Return a dict with a short identifier of the processmanager as key and the classname as value: {‘intelmq’: intelmq.lib.processmanager.IntelMQProcessManager, ‘supervisor’: intelmq.lib.processmanager.SupervisorProcessManager}
Support for splitting large raw reports into smaller ones.
The main intention of this module is to help work around limitations in Redis which limits strings to 512MB. Collector bots can use the functions in this module to split the incoming data into smaller pieces which can be sent as separate reports.
Collectors usually don’t really know anything about the data they collect, so the data cannot be reliably split into pieces in all cases. This module can be used for those cases, though, where users know that the data is actually a line-based format and can easily be split into pieces as newline characters. For this to work, some assumptions are made:
The data can be split at any newline character
This would not work, for e.g. a CSV based formats which allow newlines in values as long as they’re within quotes.
The lines are much shorter than the maximum chunk size
Obviously, if this condition does not hold, it may not be possible to split the data into small enough chunks at newline characters.
Other considerations:
To accommodate CSV formats, the code can optionally replicate the first line of the file at the start of all chunks.
The redis limit applies to the entire IntelMQ report, not just the raw data. The report has some meta data in addition to the raw data and the raw data is encoded as base64 in the report. The maximum chunk size must take this into account, but multiplying the actual limit by 3/4 and subtracting a generous amount for the meta data.
- intelmq.lib.splitreports.generate_reports(report_template: Report, infile: BinaryIO, chunk_size: Optional[int], copy_header_line: bool) Generator[Report, None, None] ¶
Generate reports from a template and input file, optionally split into chunks.
If chunk_size is None, a single report is generated with the entire contents of infile as the raw data. Otherwise chunk_size should be an integer giving the maximum number of bytes in a chunk. The data read from infile is then split into chunks of this size at newline characters (see read_delimited_chunks). For each of the chunks, this function yields a copy of the report_template with that chunk as the value of the raw attribute.
When splitting the data into chunks, if copy_header_line is true, the first line the file is read before chunking and then prepended to each of the chunks. This is particularly useful when splitting CSV files.
The infile should be a file-like object. generate_reports uses only two methods, readline and read, with readline only called once and only if copy_header_line is true. Both methods should return bytes objects.
- Params:
report_template: report used as template for all yielded copies infile: stream to read from chunk_size: maximum size of each chunk copy_header_line: copy the first line of the infile to each chunk
- Yields
report – a Report object holding the chunk in the raw field
- intelmq.lib.splitreports.read_delimited_chunks(infile: BinaryIO, chunk_size: int) Generator[bytes, None, None] ¶
Yield the contents of infile in chunk_size pieces ending at newlines. The individual pieces, except for the last one, end in newlines and are smaller than chunk_size if possible.
- Params:
infile: stream to read from chunk_size: maximum size of each chunk
- Yields
chunk – chunk with maximum size of chunk_size if possible
- intelmq.lib.splitreports.split_chunks(chunk: bytes, chunk_size: int) List[bytes] ¶
Split a bytestring into chunk_size pieces at ASCII newlines characters.
The return value is a list of bytestring objects. Appending all of them yields a bytestring equal to the input string. All items in the list except the last item end in newline. The items are shorter than chunk_size if possible, but may be longer if the input data has places where the distance between two neline characters is too long.
Note in particular, that the last item may not end in a newline!
- Params:
chunk: The string to be split chunk_size: maximum size of each chunk
- Returns
List of resulting chunks
- Return type
chunks
Utilities for testing intelmq bots.
The BotTestCase can be used as base class for unittests on bots. It includes some basic generic tests (logged errors, correct pipeline setup).
- class intelmq.lib.test.BotTestCase¶
Bases:
object
Provides common tests and assert methods for bot testing.
- assertAnyLoglineEqual(message: str, levelname: str = 'ERROR')¶
Asserts if any logline matches a specific requirement.
- Parameters
message – Message text which is compared
type – Type of logline which is asserted
- Raises
ValueError – if logline message has not been found
- assertLogMatches(pattern: str, levelname: str = 'ERROR')¶
Asserts if any logline matches a specific requirement.
- Parameters
pattern – Message text which is compared, regular expression.
levelname – Log level of the logline which is asserted, upper case.
- assertLoglineEqual(line_no: int, message: str, levelname: str = 'ERROR')¶
Asserts if a logline matches a specific requirement.
- Parameters
line_no – Number of the logline which is asserted
message – Message text which is compared
levelname – Log level of logline which is asserted
- assertLoglineMatches(line_no: int, pattern: str, levelname: str = 'ERROR')¶
Asserts if a logline matches a specific requirement.
- Parameters
line_no – Number of the logline which is asserted
pattern – Message text which is compared
type – Type of logline which is asserted
- assertMessageEqual(queue_pos, expected_msg, compare_raw=True, path='_default')¶
Asserts that the given expected_message is contained in the generated event with given queue position.
- assertNotRegexpMatchesLog(pattern)¶
Asserts that pattern doesn’t match against log.
- assertOutputQueueLen(queue_len=0, path='_default')¶
Asserts that the output queue has the expected length.
- assertRegexpMatchesLog(pattern)¶
Asserts that pattern matches against log.
- bot_types = {'collector': 'CollectorBot', 'expert': 'ExpertBot', 'output': 'OutputBot', 'parser': 'ParserBot'}¶
- get_input_internal_queue()¶
Returns the internal input queue of this bot which can be filled with fixture data in setUp()
- get_input_queue()¶
Returns the input queue of this bot which can be filled with fixture data in setUp()
- get_mocked_logger(logger)¶
- get_output_queue(path='_default')¶
Getter for items in the output queues of this bot. Use in TestCase scenarios If there is multiple queues in named queue group, we return all the items chained.
- harmonization = {'event': {'classification.identifier': {'description': 'The lowercase identifier defines the actual software or service (e.g. ``heartbleed`` or ``ntp_version``) or standardized malware name (e.g. ``zeus``). Note that you MAY overwrite this field during processing for your individual setup. This field is not standardized across IntelMQ setups/users.', 'type': 'String'}, 'classification.taxonomy': {'description': 'We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The European CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check `ENISA taxonomies <http://www.enisa.europa.eu/activities/cert/support/incident-management/browsable/incident-handling-process/incident-taxonomy/existing-taxonomies>`_.', 'length': 100, 'type': 'ClassificationTaxonomy'}, 'classification.type': {'description': 'The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid *type explosion*, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.', 'type': 'ClassificationType'}, 'comment': {'description': 'Free text commentary about the abuse event inserted by an analyst.', 'type': 'String'}, 'destination.abuse_contact': {'description': 'Abuse contact for destination address. A comma separated list.', 'type': 'LowercaseString'}, 'destination.account': {'description': 'An account name or email address, which has been identified to relate to the destination of an abuse event.', 'type': 'String'}, 'destination.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'destination.as_name': {'description': 'The autonomous system name to which the connection headed.', 'type': 'String'}, 'destination.asn': {'description': 'The autonomous system number to which the connection headed.', 'type': 'ASN'}, 'destination.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'destination.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the destination IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'destination.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'destination.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'destination.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'destination.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'destination.ip': {'description': 'The IP which is the target of the observed connections.', 'type': 'IPAddress'}, 'destination.local_hostname': {'description': 'Some sources report an internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'destination.local_ip': {'description': 'Some sources report an internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'destination.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'destination.port': {'description': 'The port to which the connection headed.', 'type': 'Integer'}, 'destination.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'destination.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.tor_node': {'description': 'If the destination IP was a known tor node.', 'type': 'Boolean'}, 'destination.url': {'description': 'A URL denotes on IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'destination.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'event_description.target': {'description': 'Some sources denominate the target (organization) of a an attack.', 'type': 'String'}, 'event_description.text': {'description': 'A free-form textual description of an abuse event.', 'type': 'String'}, 'event_description.url': {'description': 'A description URL is a link to a further description of the the abuse event in question.', 'type': 'URL'}, 'event_hash': {'description': 'Computed event hash with specific keys and values that identify a unique event. At present, the hash should default to using the SHA1 function. Please note that for an event hash to be able to match more than one event (deduplication) the receiver of an event should calculate it based on a minimal set of keys and values present in the event. Using for example the observation time in the calculation will most likely render the checksum useless for deduplication purposes.', 'length': 40, 'regex': '^[A-F0-9./]+$', 'type': 'UppercaseString'}, 'extra': {'description': 'All anecdotal information, which cannot be parsed into the data harmonization elements. E.g. os.name, os.version, etc. **Note**: this is only intended for mapping any fields which can not map naturally into the data harmonization. It is not intended for extending the data harmonization with your own fields.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'malware.hash.md5': {'description': 'A string depicting an MD5 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha1': {'description': 'A string depicting a SHA1 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha256': {'description': 'A string depicting a SHA256 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.name': {'description': 'The malware name in lower case.', 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'malware.version': {'description': 'A version string for an identified artifact generation, e.g. a crime-ware kit.', 'regex': '^[ -~]+$', 'type': 'String'}, 'misp.attribute_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID of an attribute.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}$', 'type': 'LowercaseString'}, 'misp.event_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[0-9a-z]{12}$', 'type': 'LowercaseString'}, 'output': {'description': 'Event data converted into foreign format, intended to be exported by output plugin.', 'type': 'JSON'}, 'protocol.application': {'description': 'e.g. vnc, ssh, sip, irc, http or smtp.', 'length': 100, 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'protocol.transport': {'description': 'e.g. tcp, udp, icmp.', 'iregex': '^(ip|icmp|igmp|ggp|ipencap|st2|tcp|cbt|egp|igp|bbn-rcc|nvp(-ii)?|pup|argus|emcon|xnet|chaos|udp|mux|dcn|hmp|prm|xns-idp|trunk-1|trunk-2|leaf-1|leaf-2|rdp|irtp|iso-tp4|netblt|mfe-nsp|merit-inp|sep|3pc|idpr|xtp|ddp|idpr-cmtp|tp\\+\\+|il|ipv6|sdrp|ipv6-route|ipv6-frag|idrp|rsvp|gre|mhrp|bna|esp|ah|i-nlsp|swipe|narp|mobile|tlsp|skip|ipv6-icmp|ipv6-nonxt|ipv6-opts|cftp|sat-expak|kryptolan|rvd|ippc|sat-mon|visa|ipcv|cpnx|cphb|wsn|pvp|br-sat-mon|sun-nd|wb-mon|wb-expak|iso-ip|vmtp|secure-vmtp|vines|ttp|nsfnet-igp|dgp|tcf|eigrp|ospf|sprite-rpc|larp|mtp|ax.25|ipip|micp|scc-sp|etherip|encap|gmtp|ifmp|pnni|pim|aris|scps|qnx|a/n|ipcomp|snp|compaq-peer|ipx-in-ip|vrrp|pgm|l2tp|ddx|iatp|st|srp|uti|smp|sm|ptp|isis|fire|crtp|crdup|sscopmce|iplt|sps|pipe|sctp|fc|divert)$', 'length': 11, 'type': 'LowercaseString'}, 'raw': {'description': 'The original line of the event from encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'screenshot_url': {'description': 'Some source may report URLs related to a an image generated of a resource without any metadata. Or an URL pointing to resource, which has been rendered into a webshot, e.g. a PNG image and the relevant metadata related to its retrieval/generation.', 'type': 'URL'}, 'source.abuse_contact': {'description': 'Abuse contact for source address. A comma separated list.', 'type': 'LowercaseString'}, 'source.account': {'description': 'An account name or email address, which has been identified to relate to the source of an abuse event.', 'type': 'String'}, 'source.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'source.as_name': {'description': 'The autonomous system name from which the connection originated.', 'type': 'String'}, 'source.asn': {'description': 'The autonomous system number from which originated the connection.', 'type': 'ASN'}, 'source.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'source.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the source IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'source.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'source.geolocation.cymru_cc': {'description': 'The country code denoted for the ip by the Team Cymru asn to ip mapping service.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.geoip_cc': {'description': 'MaxMind Country Code (ISO3166-1 alpha-2).', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'source.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'source.ip': {'description': 'The ip observed to initiate the connection', 'type': 'IPAddress'}, 'source.local_hostname': {'description': 'Some sources report a internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'source.local_ip': {'description': 'Some sources report a internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'source.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'source.port': {'description': 'The port from which the connection originated.', 'length': 5, 'type': 'Integer'}, 'source.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'source.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.tor_node': {'description': 'If the source IP was a known tor node.', 'type': 'Boolean'}, 'source.url': {'description': 'A URL denotes an IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'source.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'status': {'description': 'Status of the malicious resource (phishing, dropzone, etc), e.g. online, offline.', 'type': 'String'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}, 'time.source': {'description': 'The time of occurrence of the event as reported the feed (source).', 'type': 'DateTime'}, 'tlp': {'description': 'Traffic Light Protocol level of the event.', 'type': 'TLP'}}, 'report': {'extra': {'description': 'All anecdotal information of the report, which cannot be parsed into the data harmonization elements. E.g. subject of mails, etc. This is data is not automatically propagated to the events.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'raw': {'description': 'The original raw and unparsed data encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}}}¶
- property input_queue¶
Returns the input queue of this bot which can be filled with fixture data in setUp()
- new_event()¶
- new_report(auto=False, examples=False)¶
- prepare_bot(parameters={}, destination_queues=None, prepare_source_queue: bool = True)¶
Reconfigures the bot with the changed attributes.
- Parameters
parameters – optional bot parameters for this run, as dict
destination_queues – optional definition of destination queues default: {“_default”: “{}-output”.format(self.bot_id)}
- prepare_source_queue()¶
- run_bot(iterations: int = 1, error_on_pipeline: bool = False, prepare=True, parameters={}, allowed_error_count=0, allowed_warning_count=0, stop_bot: bool = True)¶
Call this method for actually doing a test run for the specified bot.
- Parameters
iterations – Bot instance will be run the given times, defaults to 1.
parameters – passed to prepare_bot
allowed_error_count – maximum number allow allowed errors in the logs
allowed_warning_count – maximum number allow allowed warnings in the logs
bot_stop – If the bot should be stopped/shut down after running it. Set to False, if you are calling this method again afterwards, as the bot shutdown destroys structures (pipeline, etc.)
- classmethod setUpClass()¶
Set default values and save original functions.
- set_input_queue(seq)¶
Setter for the input queue of this bot
- tearDown()¶
Check if the bot did consume all messages.
Executed after every test run.
- classmethod tearDownClass()¶
- test_bot_name(*args, **kwargs)¶
Test if Bot has a valid name. Must be CamelCase and end with CollectorBot etc.
Accept arbitrary arguments in case the test methods get mocked and get some additional arguments. All arguments are ignored.
- test_static_bot_check_method(*args, **kwargs)¶
Check if the bot’s static check() method completes without errors (exceptions). The return value (errors) are not checked.
The arbitrary parameters for this test function are needed because if a mocker mocks the test class, parameters can be added. See for example intelmq.tests.bots.collectors.http.test_collector.
© 2020 Sebastian Wagner <wagner@cert.at>
SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.lib.upgrades.v100_dev7_modify_syntax(configuration, harmonization, dry_run, **kwargs)¶
Migrate modify bot configuration format
- intelmq.lib.upgrades.v110_deprecations(configuration, harmonization, dry_run, **kwargs)¶
Checking for deprecated runtime configurations (stomp collector, cymru parser, ripe expert, collector feed parameter)
- intelmq.lib.upgrades.v110_shadowserver_feednames(configuration, harmonization, dry_run, **kwargs)¶
Replace deprecated Shadowserver feednames
- intelmq.lib.upgrades.v111_defaults_process_manager(configuration, harmonization, dry_run, **kwargs)¶
Fix typo in proccess_manager parameter
- intelmq.lib.upgrades.v112_feodo_tracker_domains(configuration, harmonization, dry_run, **kwargs)¶
Search for discontinued feodotracker domains feed
- intelmq.lib.upgrades.v112_feodo_tracker_ips(configuration, harmonization, dry_run, **kwargs)¶
Fix URL of feodotracker IPs feed in runtime configuration
- intelmq.lib.upgrades.v200_defaults_broker(configuration, harmonization, dry_run, **kwargs)¶
Inserting *_pipeline_broker and deleting broker into/from defaults configuration
- intelmq.lib.upgrades.v200_defaults_ssl_ca_certificate(configuration, harmonization, dry_run, **kwargs)¶
Add ssl_ca_certificate to defaults
- intelmq.lib.upgrades.v200_defaults_statistics(configuration, harmonization, dry_run, **kwargs)¶
Inserting statistics_* parameters into defaults configuration file
- intelmq.lib.upgrades.v202_fixes(configuration, harmonization, dry_run, **kwargs)¶
Migrate Collector parameter feed to name. RIPE expert set query_ripe_stat_ip with query_ripe_stat_asn as default. Set cymru whois expert overwrite to true.
- intelmq.lib.upgrades.v210_deprecations(configuration, harmonization, dry_run, **kwargs)¶
Migrating configuration
- intelmq.lib.upgrades.v213_deprecations(configuration, harmonization, dry_run, **kwargs)¶
migrate attach_unzip to extract_files for mail attachment collector
- intelmq.lib.upgrades.v213_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feed configuration for changed feed parameters.
- intelmq.lib.upgrades.v220_azure_collector(configuration, harmonization, dry_run, **kwargs)¶
Checking for the Microsoft Azure collector
- intelmq.lib.upgrades.v220_configuration(configuration, harmonization, dry_run, **kwargs)¶
Migrating configuration
- intelmq.lib.upgrades.v220_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feed configuration for changed feed parameters.
- intelmq.lib.upgrades.v221_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feeds’ configuration for changed/fixed parameters. Deprecation of HP Hosts file feed & parser.
- intelmq.lib.upgrades.v222_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrate Shadowserver feed name
- intelmq.lib.upgrades.v230_csv_parser_parameter_fix(configuration, harmonization, dry_run, **kwargs)¶
Fix CSV parser parameter misspelling
- intelmq.lib.upgrades.v230_deprecations(configuration, harmonization, dry_run, **kwargs)¶
Deprecate malwaredomainlist parser
- intelmq.lib.upgrades.v230_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feeds’ configuration for changed/fixed parameter
- intelmq.lib.upgrades.v233_feodotracker_browse(configuration, harmonization, dry_run, **kwargs)¶
Migrate Abuse.ch Feodotracker Browser feed parsing parameters
- intelmq.lib.upgrades.v300_bots_file_removal(configuration, harmonization, dry_run, **kwargs)¶
Remove BOTS file
- intelmq.lib.upgrades.v300_defaults_file_removal(configuration, harmonization, dry_run, **kwargs)¶
Remove the defaults.conf file
- intelmq.lib.upgrades.v300_pipeline_file_removal(configuration, harmonization, dry_run, **kwargs)¶
Remove the pipeline.conf file
- intelmq.lib.upgrades.v301_deprecations(configuration, harmonization, dry_run, **kwargs)¶
Deprecate malwaredomains parser and collector
- intelmq.lib.upgrades.v310_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feeds’ configuration for changed/fixed parameter
- intelmq.lib.upgrades.v310_shadowserver_feednames(configuration, harmonization, dry_run, **kwargs)¶
Remove legacy Shadowserver feednames
Common utility functions for intelmq.
decode encode base64_decode base64_encode load_configuration log reverse_readline parse_logline
- class intelmq.lib.utils.RewindableFileHandle(f, condition: ~typing.Optional[~typing.Callable] = <function RewindableFileHandle.<lambda>>)¶
Bases:
object
Can be used for easy retrieval of last input line to populate raw field during CSV parsing and handling filtering.
- intelmq.lib.utils.base64_decode(value: Union[bytes, str]) str ¶
- Parameters
value – base64 encoded string
- Returns
decoded string
- Return type
retval
Notes
Possible bytes - unicode conversions problems are ignored.
- intelmq.lib.utils.base64_encode(value: Union[bytes, str]) str ¶
- Parameters
value – string to be encoded
- Returns
base64 representation of value
- Return type
retval
Notes
Possible bytes - unicode conversions problems are ignored.
- intelmq.lib.utils.decode(text: Union[bytes, str], encodings: Sequence[str] = ('utf-8',), force: bool = False) str ¶
Decode given string to UTF-8 (default).
- Parameters
text – if unicode string is given, same object is returned
encodings – list/tuple of encodings to use
force – Ignore invalid characters
- Returns
converted unicode string
- Raises
ValueError – if decoding failed
- intelmq.lib.utils.encode(text: Union[bytes, str], encodings: Sequence[str] = ('utf-8',), force: bool = False) bytes ¶
Encode given string from UTF-8 (default).
- Parameters
text – if bytes string is given, same object is returned
encodings – list/tuple of encodings to use
force – Ignore invalid characters
- Returns
converted bytes string
- Raises
ValueError – if encoding failed
- intelmq.lib.utils.error_message_from_exc(exc: Exception) str ¶
>>> exc = IndexError('This is a test') >>> error_message_from_exc(exc) 'This is a test'
- Parameters
exc –
- Returns
The error message of exc
- Return type
result
- intelmq.lib.utils.file_name_from_response(response: Response) str ¶
Extract the file name from the Content-Disposition header of the Response object or the URL as fallback
- Parameters
response – a Response object retrieved from a call with the requests library
- Returns
The file name
- Return type
file_name
- intelmq.lib.utils.get_global_settings() dict ¶
- intelmq.lib.utils.list_all_bots() dict ¶
Compile a dictionary with all bots and their parameters.
Includes * the bots’ names * the description from the docstring * parameters including default values.
For the parameters, parameters of the Bot class are excluded if they have the same value.
- intelmq.lib.utils.load_configuration(configuration_filepath: str) dict ¶
Load JSON or YAML configuration file.
- Parameters
configuration_filepath – Path to file to load.
- Returns
Parsed configuration
- Return type
config
- Raises
ValueError – if file not found
- intelmq.lib.utils.load_parameters(*configs: dict) Parameters ¶
Load dictionaries into new Parameters() instance.
- Parameters
*configs – Arbitrary number of dictionaries to load.
- Returns
class instance with items of configs as attributes
- Return type
parameters
- intelmq.lib.utils.log(name: str, log_path: Union[str, bool] = '/opt/intelmq/var/log/', log_level: str = 'INFO', stream: Optional[object] = None, syslog: Optional[Union[bool, str, list, tuple]] = None, log_format_stream: str = '%(name)s: %(message)s', logging_level_stream: Optional[str] = None, log_max_size: Optional[int] = 0, log_max_copies: Optional[int] = None)¶
- intelmq.lib.utils.parse_logline(logline: str, regex: str = '^(?P<date>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d+) - (?P<bot_id>([-\\w]+|py\\.warnings))(?P<thread_id>\\.[0-9]+)? - (?P<log_level>[A-Z]+) - (?P<message>.+)$') Union[dict, str] ¶
Parses the given logline string into its components.
- Parameters
logline – logline to be parsed
regex – The regular expression used to parse the line
- Returns
- dictionary with keys: [‘date’, ‘bot_id’, ‘log_level’, ‘message’]
or string if the line can’t be parsed
- Return type
result
See also
LOG_REGEX: Regular expression for default log format of file handler SYSLOG_REGEX: Regular expression for log format of syslog
- intelmq.lib.utils.parse_relative(relative_time: str) int ¶
Parse relative time attributes and returns the corresponding minutes.
>>> parse_relative('4 hours') 240
- Parameters
relative_time – a string holding a relative time specification
- Returns
Minutes
- Return type
result
- Raises
ValueError – If relative_time is not parseable
See also
TIMESPANS: Defines the conversion of verbal timespans to minutes
- intelmq.lib.utils.reverse_readline(filename: str, buf_size=100000) Generator[str, None, None] ¶
Submodules¶
intelmq.version module¶
Module contents¶
Licence¶
This software is licensed under GNU Affero General Public License version 3
Funded by¶
This project was partially funded by the CEF framework
