intelmq.lib package¶

Subpackages¶

intelmq.lib.mixins package

Submodules¶

intelmq.lib.bot module¶

The bot library has the base classes for all bots.

Bot: generic base class for all kind of bots
CollectorBot: base class for collectors
ParserBot: base class for parsers

class intelmq.lib.bot.Bot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: bool = None, settings: dict | None = None, source_queue: str | None = None, standalone: bool = False)¶

Bases: object

Not to be reset when initialized again on reload.

classmethod _create_argparser()¶: see https://github.com/certtools/intelmq/pull/1524/files#r464606370 why this code is not in the constructor

_parse_common_parameters()¶

Parses and sanitizes commonly used parameters:

extract_files

_parse_extract_file_parameter(parameter_name: str = 'extract_files')¶

Parses and sanitizes commonly used parameters:

extract_files

accuracy: int = 100¶

acknowledge_message()¶

Acknowledges that the last message has been processed, if any.

For bots without source pipeline (collectors), this is a no-op.

static check(parameters: dict) → List[List[str]] | None¶

The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.

Parameters:

parameters – Bot’s parameters, defaults and runtime merged together

Returns:

None or a list of [log_level, log_message] pairs, both: strings. log_level must be a valid log level.

Return type:

output

description: str | None = None¶

destination_pipeline_broker: str = 'redis'¶

destination_pipeline_db: int = 2¶

destination_pipeline_host: str = '127.0.0.1'¶

destination_pipeline_password: str | None = None¶

destination_pipeline_port: int = 6379¶

destination_queues: dict = {}¶

enabled: bool = True¶

error_dump_message: bool = True¶

error_log_exception: bool = True¶

error_log_message: bool = False¶

error_max_retries: int = 3¶

error_procedure: str = 'pass'¶

error_retry_delay: int = 15¶

group: str | None = None¶

property harmonization¶

http_proxy: str | None = None¶

http_timeout_max_tries: int = 3¶

http_timeout_sec: int = 30¶

http_user_agent: str = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'¶

http_verify_cert: bool | str = True¶

https_proxy: str | None = None¶

init()¶

instances_threads: int = 0¶

is_multithreaded: bool = False¶

load_balance: bool = False¶

log_processed_messages_count: int = 500¶

log_processed_messages_seconds: int = 900¶

logger = None¶

logging_handler: str = 'file'¶

logging_level: str = 'INFO'¶

logging_path: str = '/opt/intelmq/var/log/'¶

logging_syslog: str = '/dev/log'¶

module = None¶

name: str | None = None¶

new_event(*args, **kwargs)¶

process_manager: str = 'intelmq'¶

process_message(*messages: Message | dict)¶

Call the bot’s process method with a prepared source queue. Return value is a dict with the complete pipeline state. Multiple messages can be given as positional argument. The pipeline needs to be configured accordinglit with BotLibSettings, see https://intelmq.readthedocs.io/en/develop/dev/library.html

Access the output queue e.g. with return_value[‘output’]

rate_limit: int = 0¶

receive_message() → Message¶: If the bot is reloaded when waiting for an incoming message, the received message will be rejected to the pipeline in the first place to get to a clean state. Then, after reloading, the message will be retrieved again.

classmethod run(parsed_args=None)¶

run_mode: str = 'continuous'¶

send_message(*messages, path: str = '_default', auto_add=None, path_permissive: bool = False)¶

Parameters:

messages – Instances of intelmq.lib.message.Message class
auto_add – ignored
path_permissive – If true, do not raise an error if the path is not configured

set_request_parameters()¶

shutdown()¶

source_pipeline_broker: str = 'redis'¶

source_pipeline_db: int = 2¶

source_pipeline_host: str = '127.0.0.1'¶

source_pipeline_password: str | None = None¶

source_pipeline_port: int = 6379¶

source_queue: str | None = None¶

ssl_ca_certificate: str | None = None¶

start(starting: bool = True, error_on_pipeline: bool = True, error_on_message: bool = False, source_pipeline: str | None = None, destination_pipeline: str | None = None)¶

statistics_database: int = 3¶

statistics_host: str = '127.0.0.1'¶

statistics_password: str | None = None¶

statistics_port: int = 6379¶

stop(exitcode: int = 1)¶

class intelmq.lib.bot.CollectorBot(*args, **kwargs)¶

Bases: Bot

Base class for collectors.

Does some sanity checks on message sending.

accuracy: int = 100¶

bottype = 'Collector'¶

code: str | None = None¶

documentation: str | None = None¶

name: str | None = None¶

new_report()¶

provider: str | None = None¶

send_message(*messages, path: str = '_default', auto_add: bool = True)¶: ” :param messages: Instances of intelmq.lib.message.Message class :param path: Named queue the message will be send to :param auto_add: Add some default report fields form parameters

class intelmq.lib.bot.ExpertBot(*args, **kwargs)¶

Bases: Bot

Base class for expert bots.

bottype = 'Expert'¶

class intelmq.lib.bot.OutputBot(*args, **kwargs)¶

Bases: Bot

Base class for outputs.

bottype = 'Output'¶

export_event(event: Event, return_type: type | None = None) → str | dict¶

exports an event according to the following parameters:

message_hierarchical
message_with_type
message_jsondict_as_string
single_key
keep_raw_field

Parameters:: return_type – Ensure that the returned value is of the given type. Optional. For example: str If the resulting value is not an instance of this type, the given object is called with the value as parameter E.g. str(retval)

class intelmq.lib.bot.ParserBot(*args, **kwargs)¶

Bases: Bot

_get_io_and_save_line_ending(raw: str) → StringIO¶

Prepare StringIO and save the original line ending

The line ending is saved in self._line_ending. The default value is rn, the same as default used by csv module

bottype = 'Parser'¶

default_fields: dict | None = {}¶

parse(report: Report)¶

A generator yielding the single elements of the data.

Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).

Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:

parse = ParserBot.parse_csv

You should do that for recovering lines too.: recover_line = ParserBot.recover_line_csv

parse_csv(report: Report)¶: A basic CSV parser. The resulting lines are lists.

parse_csv_dict(report: Report)¶: A basic CSV Dictionary parser. The resulting lines are dictionaries with the column names as keys.

parse_json(report: Report)¶: A basic JSON parser. Assumes a list of objects as input to be yield.

parse_json_stream(report: Report)¶: A JSON Stream parses (one JSON data structure per line)

parse_line(line: Any, report: Report)¶

A generator which can yield one or more messages contained in line.

Report has the full message, thus you can access some metadata. Override for your use.

process()¶

recover_line(line: str | None = None) → str¶

Reverse of “parse” for single lines.

Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.

Parameters:

line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.

Raises:

ValueError – If neither the parameter “line” nor the member “self._current_line” is available.

Returns:

str: The reconstructed raw data.

recover_line_csv(line: list | None = None) → str¶

Recover csv line, respecting saved line ending.

Parameter:: line: Optional line as list. If absent, the current line is used as string.

recover_line_csv_dict(line: dict | str | None = None) → str¶: Converts dictionaries to csv. self.csv_fieldnames must be list of fields. Respect saved line ending.

recover_line_json(line: dict) → str¶

Reverse of parse for JSON pulses.

Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.

Parameters:: dict. (The line as) –
Returns:: The JSON-encoded line as string.
Return type:: str

recover_line_json_stream(line: str | None = None) → str¶

recover_line for JSON streams (one JSON element per line, no outer structure), just returns the current line, unparsed.

Parameters:: line – The line itself as dict, if available, falls back to original current line
Returns:: unparsed JSON line.
Return type:: str

intelmq.lib.bot_debugger module¶

Utilities for debugging intelmq bots.

BotDebugger is called via intelmqctl. It starts a live running bot instance, leverages logging to DEBUG level and permits even a non-skilled programmer who may find themselves puzzled with Python nuances and server deployment twists to see what’s happening in the bot and where’s the error.

Depending on the subcommand received, the class either

starts the bot as is (default)
processes single message, either injected or from default pipeline (process subcommand)
reads the message from input pipeline or send a message to output pipeline (message subcommand)

class intelmq.lib.bot_debugger.BotDebugger(runtime_configuration, bot_id, run_subcommand=None, console_type=None, message_kind=None, dryrun=None, msg=None, show=None, loglevel=None)¶

Bases: object

EXAMPLE = '\nThe message may look like:\n \'{"source.network": "178.72.192.0/18", "time.observation": "2017-05-12T05:23:06+00:00"}\' '¶

arg2msg(msg)¶

instance = None¶

leverageLogger(level)¶

load_configuration() → dict¶

Load JSON or YAML configuration file.

Parameters:: configuration_filepath – Path to file to load.
Returns:: Parsed configuration
Return type:: config
Raises:: ValueError – if file not found

static load_configuration_patch(configuration_filepath: str, *args, **kwargs) → dict¶: Mock function for utils.load_configuration which ensures the logging level parameter is set to the value we want. If Runtime configuration is detected, the logging_level parameter is - inserted in all bot’s parameters. bot_id is not accessible here, hence we add it everywhere - inserted in the global parameters (ex-defaults). Maybe not everything is necessary, but we can make sure the logging_level is just everywhere where it might be relevant, also in the future.

logging_level = None¶

messageWizzard(msg)¶

output = []¶

outputappend(msg)¶

static pprint(msg) → str¶: We can’t use standard pprint as JSON standard asks for double quotes.

run() → str¶

intelmq.lib.cache module¶

Cache is a set with information already seen by the system. This provides a way, for example, to remove duplicated events and reports in system or cache some results from experts like Cymru Whois. It’s possible to define a TTL value in each information inserted in cache. This TTL means how much time the system will keep an information in the cache.

class intelmq.lib.cache.Cache(host: str, port: int, db: str, ttl: int, password: str | None = None)¶

Bases: object

exists(key: str)¶

flush()¶: Flushes the currently opened database by calling FLUSHDB.

get(key: str)¶

set(key: str, value: Any, ttl: int | None = None)¶

intelmq.lib.datatypes module¶

class intelmq.lib.datatypes.BotType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶

Bases: str, Enum

COLLECTOR = 'Collector'¶

EXPERT = 'Expert'¶

OUTPUT = 'Output'¶

PARSER = 'Parser'¶

_generate_next_value_(start, count, last_values)¶

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_values: the list of values assigned

toJson()¶

class intelmq.lib.datatypes.LogLevel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶

Bases: Enum

CRITICAL = 4¶

DEBUG = 0¶

ERROR = 3¶

INFO = 1¶

WARNING = 2¶

class intelmq.lib.datatypes.ReturnType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶

Bases: str, Enum

JSON = 'Json'¶

PYTHON = 'Python'¶

TEXT = 'Text'¶

_generate_next_value_(start, count, last_values)¶

Generate the next value when not given.

name: the name of the member start: the initial start value or None count: the number of existing members last_values: the list of values assigned

toJson()¶

class intelmq.lib.datatypes.TimeFormat(value: str | None = None)¶

Bases: str

Pydantic style Field Type class for bot parameter time_format. Used for validation.

parse_datetime(value: str, return_datetime: bool = False) → datetime | str¶

This function uses the selected conversion function to parse the datetime value.

Parameters:

value – external datetime string
return_datetime – whether to return string or datetime object

Returns:

parsed datetime or string

static validate(value: str) → [Callable, Optional[str]]¶

This function validates the time_format parameter value.

Parameters:: value – bot parameter for datetime conversion
Returns:: correct time conversion function and the format string

intelmq.lib.exceptions module¶

IntelMQ Exception Class

exception intelmq.lib.exceptions.ConfigurationError(config: str, argument: str)¶: Bases: IntelMQException

exception intelmq.lib.exceptions.IntelMQException(message)¶: Bases: Exception

exception intelmq.lib.exceptions.IntelMQHarmonizationException(message)¶: Bases: IntelMQException

exception intelmq.lib.exceptions.InvalidArgument(argument: Any, got: Any = None, expected=None, docs: str = None)¶: Bases: IntelMQException

exception intelmq.lib.exceptions.InvalidKey(key: str, additional_text: str | None = None)¶: Bases: IntelMQHarmonizationException, KeyError

exception intelmq.lib.exceptions.InvalidValue(key: str, value: str, reason: Any = None, object: bytes = None)¶: Bases: IntelMQHarmonizationException

exception intelmq.lib.exceptions.KeyExists(key: str)¶: Bases: IntelMQHarmonizationException

exception intelmq.lib.exceptions.KeyNotExists(key: str)¶: Bases: IntelMQHarmonizationException

exception intelmq.lib.exceptions.MissingDependencyError(dependency: str, version: str | None = None, installed: str | None = None, additional_text: str | None = None)¶

Bases: IntelMQException

A missing dependency was detected. Log instructions on installation.

__init__(dependency: str, version: str | None = None, installed: str | None = None, additional_text: str | None = None)¶

Parameters:

dependency (str) – The dependency name.
version (Optional[str], optional) – The required version. The default is None.
installed (Optional[str], optional) – The currently installed version. Requires ‘version’ to be given The default is None.
additional_text (Optional[str], optional) – Arbitrary additional text to show. The default is None.

Returns:

with prepared text

Return type:

IntelMQException

exception intelmq.lib.exceptions.PipelineError(argument: str | Exception)¶: Bases: IntelMQException

intelmq.lib.harmonization module¶

The following types are implemented with sanitize() and is_valid() functions:

Base64

Boolean

ClassificationTaxonomy

ClassificationType

DateTime

FQDN

Float

Accuracy

GenericType

IPAddress

IPNetwork

Integer

JSON

JSONDict

LowercaseString

Registry

String

URL

ASN

UppercaseString

TLP

class intelmq.lib.harmonization.ASN¶

Bases: Integer

ASN type. Derived from Integer with forbidden values.

Only valid are: 0 < asn <= 4294967295 See https://en.wikipedia.org/wiki/Autonomous_system_(Internet) > The first and last ASNs of the original 16-bit integers, namely 0 and > 65,535, and the last ASN of the 32-bit numbers, namely 4,294,967,295 are > reserved and should not be used by operators.

static check_asn(value: int) → bool¶

static is_valid(value: int, sanitize: bool = False) → bool¶

static sanitize(value: int) → int | None¶

class intelmq.lib.harmonization.Accuracy¶

Bases: Float

Accuracy type. A Float between 0 and 100.

static is_valid(value: float, sanitize: bool = False) → bool¶

static sanitize(value: float) → float | None¶

class intelmq.lib.harmonization.Base64¶

Bases: String

Base64 type. Always gives unicode strings.

Sanitation encodes to base64 and accepts binary and unicode strings.

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str | None¶

class intelmq.lib.harmonization.Boolean¶

Bases: GenericType

Boolean type. Without sanitation only python bool is accepted.

Sanitation accepts string ‘true’ and ‘false’ and integers 0 and 1.

static is_valid(value: bool, sanitize: bool = False) → bool¶

static sanitize(value: bool) → bool | None¶

class intelmq.lib.harmonization.ClassificationTaxonomy¶

Bases: String

classification.taxonomy type.

The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/

These old values are automatically mapped to the new ones:

‘abusive content’ -> ‘abusive-content’ ‘information gathering’ -> ‘information-gathering’ ‘intrusion attempts’ -> ‘intrusion-attempts’ ‘malicious code’ -> ‘malicious-code’

Allowed values are:

abusive-content
availability
fraud
information-content-security
information-gathering
intrusion-attempts
intrusions
malicious-code
other
test
vulnerable

allowed_values = ['abusive-content', 'availability', 'fraud', 'information-content-security', 'information-gathering', 'intrusion-attempts', 'intrusions', 'malicious-code', 'other', 'test', 'vulnerable']¶

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str | None¶

class intelmq.lib.harmonization.ClassificationType¶

Bases: String

classification.type type.

The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/ with extensions.

These old values are automatically mapped to the new ones:

‘botnet drone’ -> ‘infected-system’ ‘ids alert’ -> ‘ids-alert’ ‘c&c’ -> ‘c2-server’ ‘c2server’ -> ‘c2-server’ ‘infected system’ -> ‘infected-system’ ‘malware configuration’ -> ‘malware-configuration’ ‘Unauthorised-information-access’ -> ‘unauthorised-information-access’ ‘leak’ -> ‘data-leak’ ‘vulnerable client’ -> ‘vulnerable-system’ ‘vulnerable service’ -> ‘vulnerable-system’ ‘ransomware’ -> ‘infected-system’ ‘unknown’ -> ‘undetermined’

These values changed their taxonomy:

‘malware’: In terms of the taxonomy ‘malicious-code’ they can be either ‘infected-system’ or ‘malware-distribution’: but in terms of malware actually, it is now taxonomy ‘other’

Allowed values are:

application-compromise
blacklist
brute-force
burglary
c2-server
copyright
data-leak
data-loss
ddos
ddos-amplifier
dga-domain
dos
exploit
harmful-speech
ids-alert
infected-system
information-disclosure
malware
malware-configuration
malware-distribution
masquerade
misconfiguration
other
outage
phishing
potentially-unwanted-accessible
privileged-account-compromise
proxy
sabotage
scanner
sniffing
social-engineering
spam
system-compromise
test
tor
unauthorised-information-access
unauthorised-information-modification
unauthorized-use-of-resources
undetermined
unprivileged-account-compromise
violence
vulnerable-system
weak-crypto

allowed_values = ('application-compromise', 'blacklist', 'brute-force', 'burglary', 'c2-server', 'copyright', 'data-leak', 'data-loss', 'ddos', 'ddos-amplifier', 'dga-domain', 'dos', 'exploit', 'harmful-speech', 'ids-alert', 'infected-system', 'information-disclosure', 'malware', 'malware-configuration', 'malware-distribution', 'masquerade', 'misconfiguration', 'other', 'outage', 'phishing', 'potentially-unwanted-accessible', 'privileged-account-compromise', 'proxy', 'sabotage', 'scanner', 'sniffing', 'social-engineering', 'spam', 'system-compromise', 'test', 'tor', 'unauthorised-information-access', 'unauthorised-information-modification', 'unauthorized-use-of-resources', 'undetermined', 'unprivileged-account-compromise', 'violence', 'vulnerable-system', 'weak-crypto')¶

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str | None¶

class intelmq.lib.harmonization.DateTime¶

Bases: String

Date and time type for timestamps.

Valid values are timestamps with time zone and in the format ‘%Y-%m-%dT%H:%M:%S+00:00’. Invalid are missing times and missing timezone information (UTC). Microseconds are also allowed.

Sanitation normalizes the timezone to UTC, which is the only allowed timezone.

The following additional conversions are available with the convert function:

timestamp

windows_nt: From Windows NT / AD / LDAP

epoch_millis: From Milliseconds since Epoch

from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’

from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’

utc_isoformat: Parse date generated by datetime.isoformat()

fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given

TIME_CONVERSIONS = {'epoch_millis': <function DateTime.from_epoch_millis>, 'from_format': <function DateTime.from_format>, 'from_format_midnight': <function DateTime.from_format_midnight>, 'fuzzy': <function DateTime.from_fuzzy>, 'timestamp': <function DateTime.from_timestamp>, 'utc_isoformat': <function DateTime.from_isoformat>, 'windows_nt': <function DateTime.from_windows_nt>, None: <function DateTime.from_fuzzy>}¶

static convert(value, format='fuzzy') → str¶

Converts date time strings according to the given format. If the timezone is not given or clear, the local time zone is assumed!

timestamp
windows_nt: From Windows NT / AD / LDAP
epoch_millis: From Milliseconds since Epoch
from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’
from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’
utc_isoformat: Parse date generated by datetime.isoformat()
fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given

static convert_from_format(value: str, format: str) → str¶: This function is replaced by ‘from_format’ function. The original name is kept for backwards compatibility and will be removed in version 4.0.

static convert_from_format_midnight(value: str, format: str) → str¶: This function is replaced by ‘from_format_midnight’ function. The original name is kept for backwards compatibility and will be removed in version 4.0.

static convert_fuzzy(value) → str¶: This function is replaced by ‘from_fuzzy’ function. The original name is kept for backwards compatibility and will be removed in version 4.0.

static from_epoch_millis(value: int | str, return_datetime: bool = False) → datetime | str¶: Returns ISO formatted datetime from given epoch timestamp with milliseconds. It ignores the milliseconds, converts it into normal timestamp and processes it.

static from_format(value: str, format: str, return_datetime: bool = False) → datetime | str¶: Converts a datetime with the given format.

static from_format_midnight(value: str, format: str, return_datetime: bool = False) → datetime | str¶: Converts a date with the given format and adds time 00:00:00 to it.

static from_fuzzy(value, return_datetime: bool = False) → datetime | str¶

static from_isoformat(value: str, return_datetime: bool = False) → datetime | str¶

Parses datetime string in ISO format. Naive datetime strings (without timezone) are assumed to be in UTC. It is much faster than universal dateutil parser. Can be used for parsing DateTime fields which are already parsed.

Returns a string with ISO format. If return_datetime is True, the return value is a datetime.datetime object.

static from_timestamp(value: int | float | str, return_datetime: bool = False) → datetime | str¶: Returns ISO formatted datetime from given timestamp.

static from_windows_nt(value: int | str, return_datetime: bool = False) → datetime | str¶

Converts the Windows NT / LDAP / Active Directory format to ISO format.

The format is: 100 nanoseconds (10^-7s) since 1601-01-01. UTC is assumed.

Parameters:

value – Time in LDAP format as integer or string. Will be converted if necessary.
return_datetime – Whether to return datetime object or just string.

Returns:

Converted ISO format string

See also

https://www.epochconverter.com/ldap

static generate_datetime_now() → str¶

static is_valid(value: str, sanitize: bool = False) → bool¶

midnight = datetime.time(0, 0)¶

static parse_utc_isoformat(value: str, return_datetime: bool = False) → datetime | str¶: This function is replaced by ‘from_isoformat’ function. The original name is kept for backwards compatibility and will be removed in version 4.0.

static sanitize(value: str) → str | None¶

class intelmq.lib.harmonization.FQDN¶

Bases: String

Fully qualified domain name type.

All valid lowercase domains are accepted, no IP addresses or URLs. Trailing dot is not allowed.

To prevent values like ‘10.0.0.1:8080’ (#1235), we check for the non-existence of ‘:’.

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str | None¶

static to_ip(value: str) → str | None¶

class intelmq.lib.harmonization.Float¶

Bases: GenericType

Float type. Without sanitation only python float/integer/long is accepted. Boolean is explicitly denied.

Sanitation accepts strings and everything float() accepts.

static is_valid(value: float, sanitize: bool = False) → bool¶

static sanitize(value: float) → float | None¶

class intelmq.lib.harmonization.GenericType¶

Bases: object

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value) → str | None¶

class intelmq.lib.harmonization.IPAddress¶

Bases: String

Type for IP addresses, all families. Uses the ipaddress module.

Sanitation accepts integers, strings and objects of ipaddress.IPv4Address and ipaddress.IPv6Address.

Valid values are only strings. 0.0.0.0 is explicitly not allowed.

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: int | str) → str | None¶

static to_int(value: str) → int | None¶

static to_reverse(ip_addr: str) → str¶

static version(value: str) → int¶

class intelmq.lib.harmonization.IPNetwork¶

Bases: String

Type for IP networks, all families. Uses the ipaddress module.

Sanitation accepts strings and objects of ipaddress.IPv4Network and ipaddress.IPv6Network. If host bits in strings are set, they will be ignored (e.g 127.0.0.1/32).

Valid values are only strings.

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str | None¶

static version(value: str) → int¶

class intelmq.lib.harmonization.Integer¶

Bases: GenericType

Integer type. Without sanitation only python integer/long is accepted. Bool is explicitly denied.

Sanitation accepts strings and everything int() accepts.

static is_valid(value: int, sanitize: bool = False) → bool¶

static sanitize(value: int) → int | None¶

class intelmq.lib.harmonization.JSON¶

Bases: String

JSON type.

Sanitation accepts any valid JSON objects.

Valid values are only unicode strings with JSON objects.

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str | None¶

class intelmq.lib.harmonization.JSONDict¶

Bases: JSON

JSONDict type.

Sanitation accepts pythons dictionaries and JSON strings.

Valid values are only unicode strings with JSON dictionaries.

static is_valid(value: str, sanitize: bool = False) → bool¶

static is_valid_subitem(value: str) → bool¶

static sanitize(value: str) → str | None¶

static sanitize_subitem(value: str) → str¶

class intelmq.lib.harmonization.LowercaseString¶

Bases: String

Like string, but only allows lower case characters.

Sanitation lowers all characters.

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → bool | None¶

class intelmq.lib.harmonization.Registry¶

Bases: UppercaseString

Registry type. Derived from UppercaseString.

Only valid values: AFRINIC, APNIC, ARIN, LACNIC, RIPE. RIPE-NCC and RIPENCC are normalized to RIPE.

ENUM = ['AFRINIC', 'APNIC', 'ARIN', 'LACNIC', 'RIPE']¶

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str¶

class intelmq.lib.harmonization.String¶

Bases: GenericType

Any non-empty string without leading or trailing whitespace.

static is_valid(value: str, sanitize: bool = False) → bool¶

class intelmq.lib.harmonization.TLP¶

Bases: UppercaseString

TLP level type. Derived from UppercaseString.

Only valid values: WHITE, GREEN, AMBER, RED.

Accepted for sanitation are different cases and the prefix ‘tlp:’.

enum = ['WHITE', 'GREEN', 'AMBER', 'RED']¶

static is_valid(value: str, sanitize: bool = False) → bool¶

prefix_pattern = re.compile('^(TLP:?)?\\s*')¶

static sanitize(value: str) → str | None¶

class intelmq.lib.harmonization.URL¶

Bases: String

URI type. Local and remote.

Sanitation converts hxxp and hxxps to http and https. For local URIs (file) a missing host is replaced by localhost.

Valid values must have the host (network location part).

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str | None¶

static to_domain_name(url: str) → str | None¶

static to_ip(url: str) → str | None¶

class intelmq.lib.harmonization.UppercaseString¶

Bases: String

Like string, but only allows upper case characters.

Sanitation uppers all characters.

static is_valid(value: str, sanitize: bool = False) → bool¶

static sanitize(value: str) → str | None¶

intelmq.lib.message module¶

Messages are the information packages in pipelines.

Use MessageFactory to get a Message object (types Report and Event).

class intelmq.lib.message.Event(message: dict | tuple = (), auto: bool = False, harmonization: dict | None = None)¶

Bases: Message

__init__(message: dict | tuple = (), auto: bool = False, harmonization: dict | None = None) → None¶

Parameters:

message – Give a report and feed.name, feed.url and time.observation will be used to construct the Event if given. If it’s another type, the value is given to dict’s init
auto – unused here
harmonization – Harmonization definition to use

class intelmq.lib.message.Message(message: dict | tuple = (), auto: bool = False, harmonization: dict = None)¶

Bases: dict

add(key: str, value: str, sanitize: bool = True, overwrite: bool | None = None, ignore: Sequence = (), raise_failure: bool = True) → bool | None¶

Add a value for the key (after sanitation).

Parameters:

key – Key as defined in the harmonization
value – A valid value as defined in the harmonization If the value is None or in _IGNORED_VALUES the value will be ignored. If the value is ignored, the key exists and overwrite is True, the key is deleted.
sanitize – Sanitation of harmonization type will be called before validation (default: True)
overwrite – Overwrite an existing value if it already exists (default: None) If True, overwrite an existing value If False, do not overwrite an existing value If None, raise intelmq.exceptions.KeyExists for an existing value
raise_failure – If a intelmq.lib.exceptions.InvalidValue should be raised for invalid values (default: True). If false, the return parameter will be False in case of invalid values.

Returns:

True if the value has been added.
False if the value is invalid and raise_failure is False or the value existed
and has not been overwritten.
None if the value has been ignored.

Raises:

intelmq.lib.exceptions.KeyExists – If key exists and won’t be overwritten explicitly.
intelmq.lib.exceptions.InvalidKey – if key is invalid.
intelmq.lib.exceptions.InvalidArgument – if ignore is not list or tuple.
intelmq.lib.exceptions.InvalidValue – If value is not valid for the given key and raise_failure is True.

change(key: str, value: str, sanitize: bool = True)¶

copy() → a shallow copy of D¶

deep_copy()¶

finditems(keyword: str)¶

get(key, default=None)¶: Return the value for key if key is in the dictionary, else default.

hash(*, filter_keys: Iterable = frozenset({}), filter_type: str = 'blacklist')¶

Return a SHA256 hash of the message as a hexadecimal string. The hash is computed over almost all key/value pairs. Depending on filter_type parameter (blacklist or whitelist), the keys defined in filter_keys_list parameter will be considered as the keys to ignore or the only ones to consider. If given, the filter_keys_list parameter should be a set.

‘time.observation’ will always be ignored.

is_valid(key: str, value: str, sanitize: bool = True) → bool¶

Checks if a value is valid for the key (after sanitation).

Parameters:

key – Key of the field
value – Value of the field
sanitize – Sanitation of harmonization type will be called before validation (default: True)

Returns:

True if the value is valid, otherwise False

Raises:

intelmq.lib.exceptions.InvalidKey – if given key is invalid.

serialize()¶

set_default_value(value: Any = None)¶: Sets a default value for items.

to_dict(hierarchical: bool = False, with_type: bool = False, jsondict_as_string: bool = False) → dict¶

Returns a copy of self, only based on a dict class.

Parameters:

hierarchical – Split all keys at a dot and save these subitems in dictionaries.
with_type – Add a value named __type containing the message type
jsondict_as_string – If False (default) treat values in JSONDict fields just as normal ones If True, save such fields as JSON-encoded string. This is the old behavior before version 1.1.

Returns:

A dictionary as copy of itself modified according: to the given parameters

Return type:

new_dict

to_json(hierarchical=False, with_type=False, jsondict_as_string=False)¶

static unserialize(message_string: str)¶

update([E, ]**F) → None. Update D from dict/iterable E and F.¶: If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

class intelmq.lib.message.MessageFactory¶

Bases: object

unserialize: JSON encoded message to object serialize: object to JSON encoded object

static from_dict(message: dict, harmonization=None, default_type: str | None = None) → dict¶

Takes dictionary Message object, returns instance of correct class.

Parameters:

message – the message which should be converted to a Message object
harmonization – a dictionary holding the used harmonization
default_type – If ‘__type’ is not present in message, the given type will be used

See also

MessageFactory.unserialize MessageFactory.serialize

static serialize(message)¶

Takes instance of message-derived class and makes JSON-encoded Message.

The class is saved in __type attribute.

static unserialize(raw_message: str, harmonization: dict = None, default_type: str | None = None) → dict¶

Takes JSON-encoded Message object, returns instance of correct class.

Parameters:

message – the message which should be converted to a Message object
harmonization – a dictionary holding the used harmonization
default_type – If ‘__type’ is not present in message, the given type will be used

See also

MessageFactory.from_dict MessageFactory.serialize

class intelmq.lib.message.Report(message: dict | tuple = (), auto: bool = False, harmonization: dict | None = None)¶

Bases: Message

__init__(message: dict | tuple = (), auto: bool = False, harmonization: dict | None = None) → None¶

Parameters:

message – Passed along to Message’s and dict’s init. If this is an instance of the Event class, the resulting Report instance has only the fields which are possible in Report, all others are stripped.
auto – if False (default), time.observation is automatically added.
harmonization – Harmonization definition to use

copy() → a shallow copy of D¶

intelmq.lib.pipeline module¶

Algorithm¶

[Receive] B RPOP LPUSH source_queue -> internal_queue [Send] LPUSH message -> destination_queue [Acknowledge] RPOP message <- internal_queue

class intelmq.lib.pipeline.Amqp(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶

Bases: Pipeline

check_connection()¶

clear_queue(queue: str) → bool¶

connect()¶

count_queued_messages(*queues) → dict¶

destination_pipeline_amqp_exchange = ''¶

destination_pipeline_amqp_virtual_host = '/'¶

destination_pipeline_db = 2¶

destination_pipeline_host = '127.0.0.1'¶

destination_pipeline_password = None¶

destination_pipeline_socket_timeout = None¶

destination_pipeline_ssl = False¶

destination_pipeline_username = None¶

disconnect()¶

intelmqctl_rabbitmq_monitoring_url = None¶

load_configurations(queues_type)¶

nonempty_queues() → set¶

queue_args = {'x-queue-mode': 'lazy'}¶

send(message: str, path: str = '_default', path_permissive: bool = False)¶: In principle we could use AMQP’s exchanges here but that architecture is incompatible to the format of our pipeline configuration.

set_queues(queues: dict, queues_type: str)¶

Parameters:

queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”

The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.

setup_channel()¶

source_pipeline_amqp_exchange = ''¶

source_pipeline_amqp_virtual_host = '/'¶

source_pipeline_db = 2¶

source_pipeline_host = '127.0.0.1'¶

source_pipeline_password = None¶

source_pipeline_socket_timeout = None¶

source_pipeline_ssl = False¶

source_pipeline_username = None¶

class intelmq.lib.pipeline.Pipeline(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶

Bases: object

acknowledge()¶

Acknowledge/delete the current message from the source queue

Parameters:

Raises:: exceptions – exceptions.PipelineError: If no message is held
Returns:: None

clear_queue(queue)¶

connect()¶

disconnect()¶

has_internal_queues = False¶

nonempty_queues() → set¶

receive() → str¶

reject_message()¶

send(message: str, path: str = '_default', path_permissive: bool = False)¶

set_queues(queues: str | None, queues_type: str)¶

Parameters:

queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”

The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.

class intelmq.lib.pipeline.PipelineFactory¶

Bases: object

static create(logger, broker=None, direction=None, queues=None, pipeline_args: dict | None = None, load_balance=False, is_multithreaded=False)¶: direction: “source” or “destination”, optional, needed for queues queues: needs direction to be set, calls set_queues bot: Bot instance

class intelmq.lib.pipeline.Pythonlist(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶

Bases: Pipeline

This pipeline uses simple lists and is only for testing purpose.

It behaves in most ways like a normal pipeline would do, including all encoding and decoding steps, but works entirely without external modules and programs. Data is saved as it comes (no conversion) and it is not blocking.

_acknowledge()¶: Removes a message from the internal queue and returns it

_receive() → bytes¶

Receives the last not yet acknowledged message.

Does not block unlike the other pipelines.

_reject_message()¶: No-op because of the internal queue

clear_all_queues()¶: Empties all queues / state

clear_queue(queue)¶: Empties given queue.

connect()¶

count_queued_messages(*queues) → dict¶: Returns the amount of queued messages over all given queue names.

disconnect()¶

send(message: str, path: str = '_default', path_permissive: bool = False)¶: Sends a message to the destination queues

set_queues(queues, queues_type)¶

Parameters:

queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”

The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.

state: Dict[str, list] = {}¶

class intelmq.lib.pipeline.Redis(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶

Bases: Pipeline

_reject_message()¶: Rejecting is a no-op as the message is in the internal queue anyway.

clear_queue(queue)¶: Clears a queue by removing (deleting) the key, which is the same as an empty list in Redis

connect()¶

count_queued_messages(*queues) → dict¶

destination_pipeline_db = 2¶

destination_pipeline_host = '127.0.0.1'¶

destination_pipeline_password = None¶

disconnect()¶

has_internal_queues = True¶

load_configurations(queues_type)¶

nonempty_queues() → set¶: Returns a list of all currently non-empty queues.

pipe = None¶

send(message: str, path: str = '_default', path_permissive: bool = False)¶

set_queues(queues, queues_type)¶

Parameters:

queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”

The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.

source_pipeline_db = 2¶

source_pipeline_host = '127.0.0.1'¶

source_pipeline_password = None¶

intelmq.lib.processmanager module¶

class intelmq.lib.processmanager.IntelMQProcessManager(*args, **kwargs)¶

Bases: ProcessManagerInterface

PIDDIR = '/opt/intelmq/var/run/'¶

PIDFILE = '/opt/intelmq/var/run/{}.pid'¶

static _interpret_commandline(pid: int, cmdline: Iterable[str], module: str, bot_id: str) → bool | str¶

Separate function to allow easy testing

Parameters¶

pidint: Process ID, used for return values (error messages) only.
cmdlineIterable[str]: The command line of the process.
modulestr: The module of the bot.
bot_idstr: The ID of the bot.

Returns¶

Union[bool, str]: DESCRIPTION.

bot_reload(bot_id, getstatus=True)¶

bot_run(bot_id, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶

bot_start(bot_id, getstatus=True)¶

bot_status(bot_id, *, proc=None)¶

bot_stop(bot_id, getstatus=True)¶

class intelmq.lib.processmanager.ProcessManagerInterface(interactive: bool, runtime_configuration: dict, logger: Logger, returntype: ReturnType, quiet: bool)¶

Bases: object

Defines an interface all processmanager must adhere to

abstract bot_reload(bot_id: str, getstatus=True)¶

abstract bot_run(bot_id: str, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶

abstract bot_start(bot_id: str, getstatus=True)¶

abstract bot_status(bot_id: str) → str¶

abstract bot_stop(bot_id: str, getstatus=True)¶

class intelmq.lib.processmanager.SupervisorProcessManager(interactive: bool, runtime_configuration: dict, logger: Logger, returntype: ReturnType, quiet: bool)¶

Bases: ProcessManagerInterface

DEFAULT_SOCKET_PATH = '/var/run/supervisor.sock'¶

class ProcessState¶

Bases: object

BACKOFF = 30¶

EXITED = 100¶

FATAL = 200¶

RUNNING = 20¶

STARTING = 10¶

STOPPED = 0¶

STOPPING = 40¶

UNKNOWN = 1000¶

static is_running(state: int) → bool¶

class RpcFaults¶

Bases: object

ABNORMAL_TERMINATION = 40¶

ALREADY_ADDED = 90¶

ALREADY_STARTED = 60¶

BAD_ARGUMENTS = 3¶

BAD_NAME = 10¶

BAD_SIGNAL = 11¶

CANT_REREAD = 92¶

FAILED = 30¶

INCORRECT_PARAMETERS = 2¶

NOT_EXECUTABLE = 21¶

NOT_RUNNING = 70¶

NO_FILE = 20¶

SHUTDOWN_STATE = 6¶

SIGNATURE_UNSUPPORTED = 4¶

SPAWN_ERROR = 50¶

STILL_RUNNING = 91¶

SUCCESS = 80¶

UNKNOWN_METHOD = 1¶

SUPERVISOR_GROUP = 'intelmq'¶

bot_reload(bot_id: str, getstatus: bool = True)¶

bot_run(bot_id, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶

bot_start(bot_id: str, getstatus: bool = True)¶

bot_status(bot_id: str) → str¶

bot_stop(bot_id: str, getstatus: bool = True)¶

intelmq.lib.processmanager.process_managers()¶: Create a list of processmanagers in this class that are implementing the ProcessManagerInterface Return a dict with a short identifier of the processmanager as key and the classname as value: {‘intelmq’: intelmq.lib.processmanager.IntelMQProcessManager, ‘supervisor’: intelmq.lib.processmanager.SupervisorProcessManager}

intelmq.lib.splitreports module¶

Support for splitting large raw reports into smaller ones.

The main intention of this module is to help work around limitations in Redis which limits strings to 512MB. Collector bots can use the functions in this module to split the incoming data into smaller pieces which can be sent as separate reports.

Collectors usually don’t really know anything about the data they collect, so the data cannot be reliably split into pieces in all cases. This module can be used for those cases, though, where users know that the data is actually a line-based format and can easily be split into pieces as newline characters. For this to work, some assumptions are made:

The data can be split at any newline character

This would not work, for e.g. a CSV based formats which allow newlines in values as long as they’re within quotes.

The lines are much shorter than the maximum chunk size

Obviously, if this condition does not hold, it may not be possible to split the data into small enough chunks at newline characters.

Other considerations:

To accommodate CSV formats, the code can optionally replicate the first line of the file at the start of all chunks.

The redis limit applies to the entire IntelMQ report, not just the raw data. The report has some meta data in addition to the raw data and the raw data is encoded as base64 in the report. The maximum chunk size must take this into account, but multiplying the actual limit by 3/4 and subtracting a generous amount for the meta data.

intelmq.lib.splitreports.generate_reports(report_template: Report, infile: BinaryIO, chunk_size: int | None, copy_header_line: bool) → Generator[Report, None, None]¶

Generate reports from a template and input file, optionally split into chunks.

If chunk_size is None, a single report is generated with the entire contents of infile as the raw data. Otherwise chunk_size should be an integer giving the maximum number of bytes in a chunk. The data read from infile is then split into chunks of this size at newline characters (see read_delimited_chunks). For each of the chunks, this function yields a copy of the report_template with that chunk as the value of the raw attribute.

When splitting the data into chunks, if copy_header_line is true, the first line the file is read before chunking and then prepended to each of the chunks. This is particularly useful when splitting CSV files.

The infile should be a file-like object. generate_reports uses only two methods, readline and read, with readline only called once and only if copy_header_line is true. Both methods should return bytes objects.

Params:: report_template: report used as template for all yielded copies infile: stream to read from chunk_size: maximum size of each chunk copy_header_line: copy the first line of the infile to each chunk

Yields:: report – a Report object holding the chunk in the raw field

intelmq.lib.splitreports.read_delimited_chunks(infile: BinaryIO, chunk_size: int) → Generator[bytes, None, None]¶

Yield the contents of infile in chunk_size pieces ending at newlines. The individual pieces, except for the last one, end in newlines and are smaller than chunk_size if possible.

Params:: infile: stream to read from chunk_size: maximum size of each chunk

Yields:: chunk – chunk with maximum size of chunk_size if possible

intelmq.lib.splitreports.split_chunks(chunk: bytes, chunk_size: int) → List[bytes]¶

Split a bytestring into chunk_size pieces at ASCII newlines characters.

The return value is a list of bytestring objects. Appending all of them yields a bytestring equal to the input string. All items in the list except the last item end in newline. The items are shorter than chunk_size if possible, but may be longer if the input data has places where the distance between two neline characters is too long.

Note in particular, that the last item may not end in a newline!

Params:: chunk: The string to be split chunk_size: maximum size of each chunk

Returns:: List of resulting chunks
Return type:: chunks

intelmq.lib.test module¶

Utilities for testing intelmq bots.

The BotTestCase can be used as base class for unittests on bots. It includes some basic generic tests (logged errors, correct pipeline setup).

class intelmq.lib.test.BotTestCase¶

Bases: object

Provides common tests and assert methods for bot testing.

assertAnyLoglineEqual(message: str, levelname: str = 'ERROR')¶

Asserts if any logline matches a specific requirement.

Parameters:

message – Message text which is compared
type – Type of logline which is asserted

Raises:

ValueError – if logline message has not been found

assertLogMatches(pattern: str, levelname: str = 'ERROR')¶

Asserts if any logline matches a specific requirement.

Parameters:

pattern – Message text which is compared, regular expression.
levelname – Log level of the logline which is asserted, upper case.

assertLoglineEqual(line_no: int, message: str, levelname: str = 'ERROR')¶

Asserts if a logline matches a specific requirement.

Parameters:

line_no – Number of the logline which is asserted
message – Message text which is compared
levelname – Log level of logline which is asserted

assertLoglineMatches(line_no: int, pattern: str, levelname: str = 'ERROR')¶

Asserts if a logline matches a specific requirement.

Parameters:

line_no – Number of the logline which is asserted
pattern – Message text which is compared
type – Type of logline which is asserted

assertMessageEqual(queue_pos, expected_msg, compare_raw=True, path='_default')¶: Asserts that the given expected_message is contained in the generated event with given queue position.

assertNotRegexpMatchesLog(pattern)¶: Asserts that pattern doesn’t match against log.

assertOutputQueueLen(queue_len=0, path='_default')¶: Asserts that the output queue has the expected length.

assertRegexpMatchesLog(pattern)¶: Asserts that pattern matches against log.

bot_types = {'collector': 'CollectorBot', 'expert': 'ExpertBot', 'output': 'OutputBot', 'parser': 'ParserBot'}¶

get_input_internal_queue()¶: Returns the internal input queue of this bot which can be filled with fixture data in setUp()

get_input_queue()¶: Returns the input queue of this bot which can be filled with fixture data in setUp()

get_mocked_logger(logger)¶

get_output_queue(path='_default')¶: Getter for items in the output queues of this bot. Use in TestCase scenarios If there is multiple queues in named queue group, we return all the items chained.

harmonization = {'event': {'classification.identifier': {'description': 'The lowercase identifier defines the actual software or service (e.g. ``heartbleed`` or ``ntp_version``) or standardized malware name (e.g. ``zeus``). Note that you MAY overwrite this field during processing for your individual setup. This field is not standardized across IntelMQ setups/users.', 'type': 'String'}, 'classification.taxonomy': {'description': 'We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The European CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check `ENISA taxonomies <http://www.enisa.europa.eu/activities/cert/support/incident-management/browsable/incident-handling-process/incident-taxonomy/existing-taxonomies>`_.', 'length': 100, 'type': 'ClassificationTaxonomy'}, 'classification.type': {'description': 'The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid *type explosion*, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.', 'type': 'ClassificationType'}, 'comment': {'description': 'Free text commentary about the abuse event inserted by an analyst.', 'type': 'String'}, 'destination.abuse_contact': {'description': 'Abuse contact for destination address. A comma separated list.', 'type': 'LowercaseString'}, 'destination.account': {'description': 'An account name or email address, which has been identified to relate to the destination of an abuse event.', 'type': 'String'}, 'destination.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'destination.as_name': {'description': 'The autonomous system name to which the connection headed.', 'type': 'String'}, 'destination.asn': {'description': 'The autonomous system number to which the connection headed.', 'type': 'ASN'}, 'destination.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'destination.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the destination IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'destination.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'destination.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'destination.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'destination.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'destination.ip': {'description': 'The IP which is the target of the observed connections.', 'type': 'IPAddress'}, 'destination.local_hostname': {'description': 'Some sources report an internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'destination.local_ip': {'description': 'Some sources report an internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'destination.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'destination.port': {'description': 'The port to which the connection headed.', 'type': 'Integer'}, 'destination.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'destination.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.tor_node': {'description': 'If the destination IP was a known tor node.', 'type': 'Boolean'}, 'destination.url': {'description': 'A URL denotes on IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'destination.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'event_description.target': {'description': 'Some sources denominate the target (organization) of a an attack.', 'type': 'String'}, 'event_description.text': {'description': 'A free-form textual description of an abuse event.', 'type': 'String'}, 'event_description.url': {'description': 'A description URL is a link to a further description of the the abuse event in question.', 'type': 'URL'}, 'event_hash': {'description': 'Computed event hash with specific keys and values that identify a unique event. At present, the hash should default to using the SHA1 function. Please note that for an event hash to be able to match more than one event (deduplication) the receiver of an event should calculate it based on a minimal set of keys and values present in the event. Using for example the observation time in the calculation will most likely render the checksum useless for deduplication purposes.', 'length': 40, 'regex': '^[A-F0-9./]+$', 'type': 'UppercaseString'}, 'extra': {'description': 'All anecdotal information, which cannot be parsed into the data harmonization elements. E.g. os.name, os.version, etc. **Note**: this is only intended for mapping any fields which can not map naturally into the data harmonization. It is not intended for extending the data harmonization with your own fields.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'malware.hash.md5': {'description': 'A string depicting an MD5 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha1': {'description': 'A string depicting a SHA1 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha256': {'description': 'A string depicting a SHA256 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.name': {'description': 'The malware name in lower case.', 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'malware.version': {'description': 'A version string for an identified artifact generation, e.g. a crime-ware kit.', 'regex': '^[ -~]+$', 'type': 'String'}, 'misp.attribute_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID of an attribute.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}$', 'type': 'LowercaseString'}, 'misp.event_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[0-9a-z]{12}$', 'type': 'LowercaseString'}, 'output': {'description': 'Event data converted into foreign format, intended to be exported by output plugin.', 'type': 'JSON'}, 'protocol.application': {'description': 'e.g. vnc, ssh, sip, irc, http or smtp.', 'length': 100, 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'protocol.transport': {'description': 'e.g. tcp, udp, icmp.', 'iregex': '^(ip|icmp|igmp|ggp|ipencap|st2|tcp|cbt|egp|igp|bbn-rcc|nvp(-ii)?|pup|argus|emcon|xnet|chaos|udp|mux|dcn|hmp|prm|xns-idp|trunk-1|trunk-2|leaf-1|leaf-2|rdp|irtp|iso-tp4|netblt|mfe-nsp|merit-inp|sep|3pc|idpr|xtp|ddp|idpr-cmtp|tp\\+\\+|il|ipv6|sdrp|ipv6-route|ipv6-frag|idrp|rsvp|gre|mhrp|bna|esp|ah|i-nlsp|swipe|narp|mobile|tlsp|skip|ipv6-icmp|ipv6-nonxt|ipv6-opts|cftp|sat-expak|kryptolan|rvd|ippc|sat-mon|visa|ipcv|cpnx|cphb|wsn|pvp|br-sat-mon|sun-nd|wb-mon|wb-expak|iso-ip|vmtp|secure-vmtp|vines|ttp|nsfnet-igp|dgp|tcf|eigrp|ospf|sprite-rpc|larp|mtp|ax.25|ipip|micp|scc-sp|etherip|encap|gmtp|ifmp|pnni|pim|aris|scps|qnx|a/n|ipcomp|snp|compaq-peer|ipx-in-ip|vrrp|pgm|l2tp|ddx|iatp|st|srp|uti|smp|sm|ptp|isis|fire|crtp|crdup|sscopmce|iplt|sps|pipe|sctp|fc|divert)$', 'length': 11, 'type': 'LowercaseString'}, 'raw': {'description': 'The original line of the event from encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'screenshot_url': {'description': 'Some source may report URLs related to a an image generated of a resource without any metadata. Or an URL pointing to resource, which has been rendered into a webshot, e.g. a PNG image and the relevant metadata related to its retrieval/generation.', 'type': 'URL'}, 'source.abuse_contact': {'description': 'Abuse contact for source address. A comma separated list.', 'type': 'LowercaseString'}, 'source.account': {'description': 'An account name or email address, which has been identified to relate to the source of an abuse event.', 'type': 'String'}, 'source.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'source.as_name': {'description': 'The autonomous system name from which the connection originated.', 'type': 'String'}, 'source.asn': {'description': 'The autonomous system number from which originated the connection.', 'type': 'ASN'}, 'source.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'source.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the source IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'source.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'source.geolocation.cymru_cc': {'description': 'The country code denoted for the ip by the Team Cymru asn to ip mapping service.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.geoip_cc': {'description': 'MaxMind Country Code (ISO3166-1 alpha-2).', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'source.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'source.ip': {'description': 'The ip observed to initiate the connection', 'type': 'IPAddress'}, 'source.local_hostname': {'description': 'Some sources report a internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'source.local_ip': {'description': 'Some sources report a internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'source.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'source.port': {'description': 'The port from which the connection originated.', 'length': 5, 'type': 'Integer'}, 'source.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'source.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.tor_node': {'description': 'If the source IP was a known tor node.', 'type': 'Boolean'}, 'source.url': {'description': 'A URL denotes an IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'source.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'status': {'description': 'Status of the malicious resource (phishing, dropzone, etc), e.g. online, offline.', 'type': 'String'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}, 'time.source': {'description': 'The time of occurrence of the event as reported the feed (source).', 'type': 'DateTime'}, 'tlp': {'description': 'Traffic Light Protocol level of the event.', 'type': 'TLP'}}, 'report': {'extra': {'description': 'All anecdotal information of the report, which cannot be parsed into the data harmonization elements. E.g. subject of mails, etc. This is data is not automatically propagated to the events.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'raw': {'description': 'The original raw and unparsed data encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}}}¶

property input_queue¶: Returns the input queue of this bot which can be filled with fixture data in setUp()

new_event()¶

new_report(auto=False, examples=False)¶

prepare_bot(parameters={}, destination_queues=None, prepare_source_queue: bool = True)¶

Reconfigures the bot with the changed attributes.

Parameters:

parameters – optional bot parameters for this run, as dict
destination_queues – optional definition of destination queues default: {“_default”: “{}-output”.format(self.bot_id)}

prepare_source_queue()¶

run_bot(iterations: int = 1, error_on_pipeline: bool = False, prepare=True, parameters={}, allowed_error_count=0, allowed_warning_count=0, stop_bot: bool = True, expected_internal_queue_size: int = 0)¶

Call this method for actually doing a test run for the specified bot.

Parameters:

iterations – Bot instance will be run the given times, defaults to 1.
parameters – passed to prepare_bot
allowed_error_count – maximum number allow allowed errors in the logs
allowed_warning_count – maximum number allow allowed warnings in the logs
bot_stop – If the bot should be stopped/shut down after running it. Set to False, if you are calling this method again afterwards, as the bot shutdown destroys structures (pipeline, etc.)

classmethod setUpClass()¶: Set default values and save original functions.

set_input_queue(seq)¶: Setter for the input queue of this bot

tearDown()¶

Check if the bot did consume all messages.

Executed after every test run.

classmethod tearDownClass()¶

test_bot_name(*args, **kwargs)¶

Test if Bot has a valid name. Must be CamelCase and end with CollectorBot etc.

Accept arbitrary arguments in case the test methods get mocked and get some additional arguments. All arguments are ignored.

test_static_bot_check_method(*args, **kwargs)¶

Check if the bot’s static check() method completes without errors (exceptions). The return value (errors) are not checked.

The arbitrary parameters for this test function are needed because if a mocker mocks the test class, parameters can be added. See for example intelmq.tests.bots.collectors.http.test_collector.

intelmq.lib.upgrades module¶

SPDX-License-Identifier: AGPL-3.0-or-later

intelmq.lib.upgrades.v100_dev7_modify_syntax(configuration, harmonization, dry_run, **kwargs)¶: Migrate modify bot configuration format

intelmq.lib.upgrades.v110_deprecations(configuration, harmonization, dry_run, **kwargs)¶: Checking for deprecated runtime configurations (stomp collector, cymru parser, ripe expert, collector feed parameter)

intelmq.lib.upgrades.v110_shadowserver_feednames(configuration, harmonization, dry_run, **kwargs)¶: Replace deprecated Shadowserver feednames

intelmq.lib.upgrades.v111_defaults_process_manager(configuration, harmonization, dry_run, **kwargs)¶: Fix typo in proccess_manager parameter

intelmq.lib.upgrades.v112_feodo_tracker_domains(configuration, harmonization, dry_run, **kwargs)¶: Search for discontinued feodotracker domains feed

intelmq.lib.upgrades.v112_feodo_tracker_ips(configuration, harmonization, dry_run, **kwargs)¶: Fix URL of feodotracker IPs feed in runtime configuration

intelmq.lib.upgrades.v200_defaults_broker(configuration, harmonization, dry_run, **kwargs)¶: Inserting *_pipeline_broker and deleting broker into/from defaults configuration

intelmq.lib.upgrades.v200_defaults_ssl_ca_certificate(configuration, harmonization, dry_run, **kwargs)¶: Add ssl_ca_certificate to defaults

intelmq.lib.upgrades.v200_defaults_statistics(configuration, harmonization, dry_run, **kwargs)¶: Inserting statistics_* parameters into defaults configuration file

intelmq.lib.upgrades.v202_fixes(configuration, harmonization, dry_run, **kwargs)¶: Migrate Collector parameter feed to name. RIPE expert set query_ripe_stat_ip with query_ripe_stat_asn as default. Set cymru whois expert overwrite to true.

intelmq.lib.upgrades.v210_deprecations(configuration, harmonization, dry_run, **kwargs)¶: Migrating configuration

intelmq.lib.upgrades.v213_deprecations(configuration, harmonization, dry_run, **kwargs)¶: migrate attach_unzip to extract_files for mail attachment collector

intelmq.lib.upgrades.v213_feed_changes(configuration, harmonization, dry_run, **kwargs)¶: Migrates feed configuration for changed feed parameters.

intelmq.lib.upgrades.v220_azure_collector(configuration, harmonization, dry_run, **kwargs)¶: Checking for the Microsoft Azure collector

intelmq.lib.upgrades.v220_configuration(configuration, harmonization, dry_run, **kwargs)¶: Migrating configuration

intelmq.lib.upgrades.v220_feed_changes(configuration, harmonization, dry_run, **kwargs)¶: Migrates feed configuration for changed feed parameters.

intelmq.lib.upgrades.v221_feed_changes(configuration, harmonization, dry_run, **kwargs)¶: Migrates feeds’ configuration for changed/fixed parameters. Deprecation of HP Hosts file feed & parser.

intelmq.lib.upgrades.v222_feed_changes(configuration, harmonization, dry_run, **kwargs)¶: Migrate Shadowserver feed name

intelmq.lib.upgrades.v230_csv_parser_parameter_fix(configuration, harmonization, dry_run, **kwargs)¶: Fix CSV parser parameter misspelling

intelmq.lib.upgrades.v230_deprecations(configuration, harmonization, dry_run, **kwargs)¶: Deprecate malwaredomainlist parser

intelmq.lib.upgrades.v230_feed_changes(configuration, harmonization, dry_run, **kwargs)¶: Migrates feeds’ configuration for changed/fixed parameter

intelmq.lib.upgrades.v233_feodotracker_browse(configuration, harmonization, dry_run, **kwargs)¶: Migrate Abuse.ch Feodotracker Browser feed parsing parameters

intelmq.lib.upgrades.v300_bots_file_removal(configuration, harmonization, dry_run, **kwargs)¶: Remove BOTS file

intelmq.lib.upgrades.v300_defaults_file_removal(configuration, harmonization, dry_run, **kwargs)¶: Remove the defaults.conf file

intelmq.lib.upgrades.v300_pipeline_file_removal(configuration, harmonization, dry_run, **kwargs)¶: Remove the pipeline.conf file

intelmq.lib.upgrades.v301_deprecations(configuration, harmonization, dry_run, **kwargs)¶: Deprecate malwaredomains parser and collector

intelmq.lib.upgrades.v310_feed_changes(configuration, harmonization, dry_run, **kwargs)¶: Migrates feeds’ configuration for changed/fixed parameter

intelmq.lib.upgrades.v310_shadowserver_feednames(configuration, harmonization, dry_run, **kwargs)¶: Remove legacy Shadowserver feednames

intelmq.lib.upgrades.v320_update_turris_greylist_url(configuration, harmonization, dry_run, **kwargs)¶: Updates Turris Greylist feed URL.

intelmq.lib.utils module¶

Common utility functions for intelmq.

decode encode base64_decode base64_encode load_configuration log reverse_readline parse_logline

class intelmq.lib.utils.RewindableFileHandle(f, condition: ~typing.Callable | None = <function RewindableFileHandle.<lambda>>)¶

Bases: object

Can be used for easy retrieval of last input line to populate raw field during CSV parsing and handling filtering.

intelmq.lib.utils.base64_decode(value: bytes | str) → str¶

Parameters:: value – base64 encoded string
Returns:: decoded string
Return type:: retval

Notes

Possible bytes - unicode conversions problems are ignored.

intelmq.lib.utils.base64_encode(value: bytes | str) → str¶

Parameters:: value – string to be encoded
Returns:: base64 representation of value
Return type:: retval

Notes

Possible bytes - unicode conversions problems are ignored.

intelmq.lib.utils.decode(text: bytes | str, encodings: Sequence[str] = ('utf-8',), force: bool = False) → str¶

Decode given string to UTF-8 (default).

Parameters:

text – if unicode string is given, same object is returned
encodings – list/tuple of encodings to use
force – Ignore invalid characters

Returns:

converted unicode string

Raises:

ValueError – if decoding failed

intelmq.lib.utils.encode(text: bytes | str, encodings: Sequence[str] = ('utf-8',), force: bool = False) → bytes¶

Encode given string from UTF-8 (default).

Parameters:

text – if bytes string is given, same object is returned
encodings – list/tuple of encodings to use
force – Ignore invalid characters

Returns:

converted bytes string

Raises:

ValueError – if encoding failed

intelmq.lib.utils.error_message_from_exc(exc: Exception) → str¶

>>> exc = IndexError('This is a test')
>>> error_message_from_exc(exc)
'This is a test'

Parameters:: exc –
Returns:: The error message of exc
Return type:: result

intelmq.lib.utils.file_name_from_response(response: Response) → str¶

Extract the file name from the Content-Disposition header of the Response object or the URL as fallback

Parameters:: response – a Response object retrieved from a call with the requests library
Returns:: The file name
Return type:: file_name

intelmq.lib.utils.get_global_settings() → dict¶

intelmq.lib.utils.list_all_bots() → dict¶

Compile a dictionary with all bots and their parameters.

Includes * the bots’ names * the description from the docstring * parameters including default values.

For the parameters, parameters of the Bot class are excluded if they have the same value.

intelmq.lib.utils.load_configuration(configuration_filepath: str) → dict¶

Load JSON or YAML configuration file.

Parameters:: configuration_filepath – Path to file to load.
Returns:: Parsed configuration
Return type:: config
Raises:: ValueError – if file not found

intelmq.lib.utils.load_parameters(*configs: dict) → Parameters¶

Load dictionaries into new Parameters() instance.

Parameters:: *configs – Arbitrary number of dictionaries to load.
Returns:: class instance with items of configs as attributes
Return type:: parameters

intelmq.lib.utils.log(name: str, log_path: str | bool = '/opt/intelmq/var/log/', log_level: str = 'INFO', stream: object | None = None, syslog: bool | str | list | tuple = None, log_format_stream: str = '%(name)s: %(message)s', logging_level_stream: str | None = None, log_max_size: int | None = 0, log_max_copies: int | None = None)¶

intelmq.lib.utils.parse_logline(logline: str, regex: str = '^(?P<date>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d+) - (?P<bot_id>([-\\w]+|py\\.warnings))(?P<thread_id>\\.[0-9]+)? - (?P<log_level>[A-Z]+) - (?P<message>.+)$') → dict | str¶

Parses the given logline string into its components.

Parameters:

logline – logline to be parsed
regex – The regular expression used to parse the line

Returns:

dictionary with keys: [‘date’, ‘bot_id’, ‘log_level’, ‘message’]: or string if the line can’t be parsed

Return type:

result

See also

LOG_REGEX: Regular expression for default log format of file handler SYSLOG_REGEX: Regular expression for log format of syslog

intelmq.lib.utils.parse_relative(relative_time: str) → int¶

Parse relative time attributes and returns the corresponding minutes.

>>> parse_relative('4 hours')
240

Parameters:: relative_time – a string holding a relative time specification
Returns:: Minutes
Return type:: result
Raises:: ValueError – If relative_time is not parseable

See also

TIMESPANS: Defines the conversion of verbal timespans to minutes

intelmq.lib.utils.reverse_readline(filename: str, buf_size=100000) → Generator[str, None, None]¶: See also

https://github.com/certtools/intelmq/issues/393#issuecomment-154041996

intelmq.lib package¶

Subpackages¶

Submodules¶

intelmq.lib.bot module¶

intelmq.lib.bot_debugger module¶

intelmq.lib.cache module¶

intelmq.lib.datatypes module¶

intelmq.lib.exceptions module¶

intelmq.lib.harmonization module¶

intelmq.lib.message module¶

intelmq.lib.pipeline module¶

Algorithm¶

intelmq.lib.processmanager module¶

Parameters¶

Returns¶

intelmq.lib.splitreports module¶

intelmq.lib.test module¶

intelmq.lib.upgrades module¶

intelmq.lib.utils module¶

Module contents¶

Navigation

Related Topics