intelmq.lib package

Subpackages

Submodules

intelmq.lib.bot module

The bot library has the base classes for all bots.
  • Bot: generic base class for all kind of bots

  • CollectorBot: base class for collectors

  • ParserBot: base class for parsers

  • SQLBot: base class for any bots using SQL

class intelmq.lib.bot.Bot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)

Bases: object

Not to be reset when initialized again on reload.

classmethod _create_argparser()

see https://github.com/certtools/intelmq/pull/1524/files#r464606370 why this code is not in the constructor

_parse_common_parameters()

Parses and sanitizes commonly used parameters:

  • extract_files

_parse_extract_file_parameter(parameter_name: str = 'extract_files')

Parses and sanitizes commonly used parameters:

  • extract_files

accuracy: int = 100
acknowledge_message()

Acknowledges that the last message has been processed, if any.

For bots without source pipeline (collectors), this is a no-op.

static check(parameters: dict) Optional[List[List[str]]]

The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.

Parameters

parameters – Bot’s parameters, defaults and runtime merged together

Returns

None or a list of [log_level, log_message] pairs, both

strings. log_level must be a valid log level.

Return type

output

description: Optional[str] = None
destination_pipeline_broker: str = 'redis'
destination_pipeline_db: int = 2
destination_pipeline_host: str = '127.0.0.1'
destination_pipeline_password: Optional[str] = None
destination_pipeline_port: int = 6379
destination_queues: dict = {}
error_dump_message: bool = True
error_log_exception: bool = True
error_log_message: bool = False
error_max_retries: int = 3
error_procedure: str = 'pass'
error_retry_delay: int = 15
group: Optional[str] = None
http_proxy: Optional[str] = None
http_timeout_max_tries: int = 3
http_timeout_sec: int = 30
http_user_agent: str = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'
http_verify_cert: Union[bool, str] = True
https_proxy: Optional[str] = None
init()
instances_threads: int = 0
is_multithreaded: bool = False
load_balance: bool = False
log_processed_messages_count: int = 500
log_processed_messages_seconds: int = 900
logging_handler: str = 'file'
logging_level: str = 'INFO'
logging_path: str = '/opt/intelmq/var/log/'
logging_syslog: str = '/dev/log'
module = None
name: Optional[str] = None
new_event(*args, **kwargs)
process_manager: str = 'intelmq'
rate_limit: int = 0
receive_message() intelmq.lib.message.Message

If the bot is reloaded when waiting for an incoming message, the received message will be rejected to the pipeline in the first place to get to a clean state. Then, after reloading, the message will be retrieved again.

classmethod run(parsed_args=None)
send_message(*messages, path: str = '_default', auto_add=None, path_permissive: bool = False)
Parameters
  • messages – Instances of intelmq.lib.message.Message class

  • auto_add – ignored

  • path_permissive – If true, do not raise an error if the path is not configured

set_request_parameters()
shutdown()
source_pipeline_broker: str = 'redis'
source_pipeline_db: int = 2
source_pipeline_host: str = '127.0.0.1'
source_pipeline_password: Optional[str] = None
source_pipeline_port: int = 6379
source_queue: Optional[str] = None
ssl_ca_certificate: Optional[str] = None
start(starting: bool = True, error_on_pipeline: bool = True, error_on_message: bool = False, source_pipeline: Optional[str] = None, destination_pipeline: Optional[str] = None)
statistics_database: int = 3
statistics_host: str = '127.0.0.1'
statistics_password: Optional[str] = None
statistics_port: int = 6379
stop(exitcode: int = 1)
class intelmq.lib.bot.CollectorBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)

Bases: intelmq.lib.bot.Bot

Base class for collectors.

Does some sanity checks on message sending.

accuracy: int = 100
code: Optional[str] = None
documentation: Optional[str] = None
name: Optional[str] = None
new_report()
provider: Optional[str] = None
send_message(*messages, path: str = '_default', auto_add: bool = True)

” :param messages: Instances of intelmq.lib.message.Message class :param path: Named queue the message will be send to :param auto_add: Add some default report fields form parameters

class intelmq.lib.bot.OutputBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)

Bases: intelmq.lib.bot.Bot

Base class for outputs.

export_event(event: intelmq.lib.message.Event, return_type: Optional[type] = None) Union[str, dict]
exports an event according to the following parameters:
  • message_hierarchical

  • message_with_type

  • message_jsondict_as_string

  • single_key

  • keep_raw_field

Parameters

return_type – Ensure that the returned value is of the given type. Optional. For example: str If the resulting value is not an instance of this type, the given object is called with the value as parameter E.g. str(retval)

class intelmq.lib.bot.ParserBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)

Bases: intelmq.lib.bot.Bot

parse(report: intelmq.lib.message.Report)

A generator yielding the single elements of the data.

Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).

Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:

parse = ParserBot.parse_csv
You should do that for recovering lines too.

recover_line = ParserBot.recover_line_csv

parse_csv(report: intelmq.lib.message.Report)

A basic CSV parser.

parse_csv_dict(report: intelmq.lib.message.Report)

A basic CSV Dictionary parser.

parse_json(report: intelmq.lib.message.Report)

A basic JSON parser. Assumes a list of objects as input to be yield.

parse_json_stream(report: intelmq.lib.message.Report)

A JSON Stream parses (one JSON data structure per line)

parse_line(line: Any, report: intelmq.lib.message.Report)

A generator which can yield one or more messages contained in line.

Report has the full message, thus you can access some metadata. Override for your use.

process()
recover_line(line: Optional[str] = None) str

Reverse of “parse” for single lines.

Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.

lineOptional[str], optional

The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.

ValueError

If neither the parameter “line” nor the member “self._current_line” is available.

str

The reconstructed raw data.

recover_line_csv(line: str) str
recover_line_csv_dict(line: str) str

Converts dictionaries to csv. self.csv_fieldnames must be list of fields.

recover_line_json(line: dict) str

Reverse of parse for JSON pulses.

Recovers a fully functional report with only the problematic pulse.

recover_line_json_stream(line=None) str

recover_line for json streams, just returns the current line, unparsed.

line : None, not required, only for compatibility with other recover_line methods

str

unparsed JSON line.

class intelmq.lib.bot.SQLBot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: Optional[bool] = None)

Bases: intelmq.lib.bot.Bot

Inherit this bot so that it handles DB connection for you. You do not have to bother:

  • connecting database in the self.init() method, just call super().init(), self.cur will be set

  • catching exceptions, just call self.execute() instead of self.cur.execute()

  • self.format_char will be set to ‘%s’ in PostgreSQL and to ‘?’ in SQLite

POSTGRESQL = 'postgresql'
SQLITE = 'sqlite'
default_engine = 'postgresql'
execute(query: str, values: tuple, rollback=False)
init()

intelmq.lib.bot_debugger module

Utilities for debugging intelmq bots.

BotDebugger is called via intelmqctl. It starts a live running bot instance, leverages logging to DEBUG level and permits even a non-skilled programmer who may find themselves puzzled with Python nuances and server deployment twists to see what’s happening in the bot and where’s the error.

Depending on the subcommand received, the class either
  • starts the bot as is (default)

  • processes single message, either injected or from default pipeline (process subcommand)

  • reads the message from input pipeline or send a message to output pipeline (message subcommand)

class intelmq.lib.bot_debugger.BotDebugger(runtime_configuration, bot_id, run_subcommand=None, console_type=None, message_kind=None, dryrun=None, msg=None, show=None, loglevel=None)

Bases: object

EXAMPLE = '\nThe message may look like:\n    \'{"source.network": "178.72.192.0/18", "time.observation": "2017-05-12T05:23:06+00:00"}\' '
arg2msg(msg)
instance = None
leverageLogger(level)
load_configuration() dict

Load JSON or YAML configuration file.

Parameters

configuration_filepath – Path to file to load.

Returns

Parsed configuration

Return type

config

Raises

ValueError – if file not found

static load_configuration_patch(configuration_filepath: str, *args, **kwargs) dict

Mock function for utils.load_configuration which ensures the logging level parameter is set to the value we want. If Runtime configuration is detected, the logging_level parameter is - inserted in all bot’s parameters. bot_id is not accessible here, hence we add it everywhere - inserted in the global parameters (ex-defaults). Maybe not everything is necessary, but we can make sure the logging_level is just everywhere where it might be relevant, also in the future.

logging_level = None
messageWizzard(msg)
output = []
outputappend(msg)
static pprint(msg) str

We can’t use standard pprint as JSON standard asks for double quotes.

run() str

intelmq.lib.cache module

Cache is a set with information already seen by the system. This provides a way, for example, to remove duplicated events and reports in system or cache some results from experts like Cymru Whois. It’s possible to define a TTL value in each information inserted in cache. This TTL means how much time the system will keep an information in the cache.

class intelmq.lib.cache.Cache(host: str, port: int, db: str, ttl: int, password: Optional[str] = None)

Bases: object

exists(key: str)
flush()

Flushes the currently opened database by calling FLUSHDB.

get(key: str)
set(key: str, value: Any, ttl: Optional[int] = None)

intelmq.lib.exceptions module

IntelMQ Exception Class

exception intelmq.lib.exceptions.ConfigurationError(config: str, argument: str)

Bases: intelmq.lib.exceptions.IntelMQException

exception intelmq.lib.exceptions.IntelMQException(message)

Bases: Exception

exception intelmq.lib.exceptions.IntelMQHarmonizationException(message)

Bases: intelmq.lib.exceptions.IntelMQException

exception intelmq.lib.exceptions.InvalidArgument(argument: Any, got: Optional[Any] = None, expected=None, docs: Optional[str] = None)

Bases: intelmq.lib.exceptions.IntelMQException

exception intelmq.lib.exceptions.InvalidKey(key: str)

Bases: intelmq.lib.exceptions.IntelMQHarmonizationException, KeyError

exception intelmq.lib.exceptions.InvalidValue(key: str, value: str, reason: Optional[Any] = None, object: Optional[bytes] = None)

Bases: intelmq.lib.exceptions.IntelMQHarmonizationException

exception intelmq.lib.exceptions.KeyExists(key: str)

Bases: intelmq.lib.exceptions.IntelMQHarmonizationException

exception intelmq.lib.exceptions.KeyNotExists(key: str)

Bases: intelmq.lib.exceptions.IntelMQHarmonizationException

exception intelmq.lib.exceptions.MissingDependencyError(dependency: str, version: Optional[str] = None, installed: Optional[str] = None, additional_text: Optional[str] = None)

Bases: intelmq.lib.exceptions.IntelMQException

A missing dependency was detected. Log instructions on installation.

__init__(dependency: str, version: Optional[str] = None, installed: Optional[str] = None, additional_text: Optional[str] = None)
dependencystr

The dependency name.

versionOptional[str], optional

The required version. The default is None.

installedOptional[str], optional

The currently installed version. Requires ‘version’ to be given The default is None.

additional_textOptional[str], optional

Arbitrary additional text to show. The default is None.

IntelMQException: with prepared text

exception intelmq.lib.exceptions.PipelineError(argument: Union[str, Exception])

Bases: intelmq.lib.exceptions.IntelMQException

intelmq.lib.harmonization module

The following types are implemented with sanitize() and is_valid() functions:

  • Base64

  • Boolean

  • ClassificationTaxonomy

  • ClassificationType

  • DateTime

  • FQDN

  • Float

  • Accuracy

  • GenericType

  • IPAddress

  • IPNetwork

  • Integer

  • JSON

  • JSONDict

  • LowercaseString

  • Registry

  • String

  • URL

  • ASN

  • UppercaseString

  • TLP

class intelmq.lib.harmonization.ASN

Bases: intelmq.lib.harmonization.Integer

ASN type. Derived from Integer with forbidden values.

Only valid are: 0 < asn <= 4294967295 See https://en.wikipedia.org/wiki/Autonomous_system_(Internet) > The first and last ASNs of the original 16-bit integers, namely 0 and > 65,535, and the last ASN of the 32-bit numbers, namely 4,294,967,295 are > reserved and should not be used by operators.

static check_asn(value: int) bool
static is_valid(value: int, sanitize: bool = False) bool
static sanitize(value: int) Optional[int]
class intelmq.lib.harmonization.Accuracy

Bases: intelmq.lib.harmonization.Float

Accuracy type. A Float between 0 and 100.

static is_valid(value: float, sanitize: bool = False) bool
static sanitize(value: float) Optional[float]
class intelmq.lib.harmonization.Base64

Bases: intelmq.lib.harmonization.String

Base64 type. Always gives unicode strings.

Sanitation encodes to base64 and accepts binary and unicode strings.

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[str]
class intelmq.lib.harmonization.Boolean

Bases: intelmq.lib.harmonization.GenericType

Boolean type. Without sanitation only python bool is accepted.

Sanitation accepts string ‘true’ and ‘false’ and integers 0 and 1.

static is_valid(value: bool, sanitize: bool = False) bool
static sanitize(value: bool) Optional[bool]
class intelmq.lib.harmonization.ClassificationTaxonomy

Bases: intelmq.lib.harmonization.String

classification.taxonomy type.

The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/

These old values are automatically mapped to the new ones:

‘abusive content’ -> ‘abusive-content’ ‘information gathering’ -> ‘information-gathering’ ‘intrusion attempts’ -> ‘intrusion-attempts’ ‘malicious code’ -> ‘malicious-code’

Allowed values are:
  • abusive-content

  • availability

  • fraud

  • information-content-security

  • information-gathering

  • intrusion-attempts

  • intrusions

  • malicious-code

  • other

  • test

  • vulnerable

allowed_values = ['abusive-content', 'availability', 'fraud', 'information-content-security', 'information-gathering', 'intrusion-attempts', 'intrusions', 'malicious-code', 'other', 'test', 'vulnerable']
static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[str]
class intelmq.lib.harmonization.ClassificationType

Bases: intelmq.lib.harmonization.String

classification.type type.

The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/ with extensions.

These old values are automatically mapped to the new ones:

‘botnet drone’ -> ‘infected-system’ ‘ids alert’ -> ‘ids-alert’ ‘c&c’ -> ‘c2-server’ ‘c2server’ -> ‘c2-server’ ‘infected system’ -> ‘infected-system’ ‘malware configuration’ -> ‘malware-configuration’ ‘Unauthorised-information-access’ -> ‘unauthorised-information-access’ ‘leak’ -> ‘data-leak’ ‘vulnerable client’ -> ‘vulnerable-system’ ‘vulnerable service’ -> ‘vulnerable-system’ ‘ransomware’ -> ‘infected-system’ ‘unknown’ -> ‘undetermined’

These values changed their taxonomy:
‘malware’: In terms of the taxonomy ‘malicious-code’ they can be either ‘infected-system’ or ‘malware-distribution’

but in terms of malware actually, it is now taxonomy ‘other’

Allowed values are:
  • application-compromise

  • blacklist

  • brute-force

  • burglary

  • c2-server

  • copyright

  • data-loss

  • ddos

  • ddos-amplifier

  • dga-domain

  • dos

  • exploit

  • harmful-speech

  • ids-alert

  • infected-system

  • information-disclosure

  • data-leak

  • malware

  • malware-configuration

  • malware-distribution

  • masquerade

  • misconfiguration

  • other

  • outage

  • phishing

  • potentially-unwanted-accessible

  • privileged-account-compromise

  • proxy

  • sabotage

  • scanner

  • sniffing

  • social-engineering

  • spam

  • system-compromise

  • test

  • tor

  • unauthorised-information-access

  • unauthorised-information-modification

  • system-compromise

  • unauthorized-use-of-resources

  • unprivileged-account-compromise

  • violence

  • vulnerable-system

  • weak-crypto

  • undetermined

allowed_values = ['application-compromise', 'blacklist', 'brute-force', 'burglary', 'c2-server', 'copyright', 'data-loss', 'ddos', 'ddos-amplifier', 'dga-domain', 'dos', 'exploit', 'harmful-speech', 'ids-alert', 'infected-system', 'information-disclosure', 'data-leak', 'malware', 'malware-configuration', 'malware-distribution', 'masquerade', 'misconfiguration', 'other', 'outage', 'phishing', 'potentially-unwanted-accessible', 'privileged-account-compromise', 'proxy', 'sabotage', 'scanner', 'sniffing', 'social-engineering', 'spam', 'system-compromise', 'test', 'tor', 'unauthorised-information-access', 'unauthorised-information-modification', 'system-compromise', 'unauthorized-use-of-resources', 'unprivileged-account-compromise', 'violence', 'vulnerable-system', 'weak-crypto', 'undetermined']
static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[str]
class intelmq.lib.harmonization.DateTime

Bases: intelmq.lib.harmonization.String

Date and time type for timestamps.

Valid values are timestamps with time zone and in the format ‘%Y-%m-%dT%H:%M:%S+00:00’. Invalid are missing times and missing timezone information (UTC). Microseconds are also allowed.

Sanitation normalizes the timezone to UTC, which is the only allowed timezone.

The following additional conversions are available with the convert function:

  • timestamp

  • windows_nt: From Windows NT / AD / LDAP

  • epoch_millis: From Milliseconds since Epoch

  • from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’

  • from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’

  • utc_isoformat: Parse date generated by datetime.isoformat()

  • fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given

TIME_CONVERSIONS = {'timestamp': <function DateTime.from_timestamp>, 'windows_nt': <function DateTime.from_windows_nt>, 'epoch_millis': <function DateTime.from_epoch_millis>, 'from_format': <function DateTime.convert_from_format>, 'from_format_midnight': <function DateTime.convert_from_format_midnight>, 'utc_isoformat': <function DateTime.parse_utc_isoformat>, 'fuzzy': <function DateTime.convert_fuzzy>, None: <function DateTime.convert_fuzzy>}
static convert(value, format='fuzzy') str

Converts date time strings according to the given format. If the timezone is not given or clear, the local time zone is assumed!

  • timestamp

  • windows_nt: From Windows NT / AD / LDAP

  • epoch_millis: From Milliseconds since Epoch

  • from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’

  • from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’

  • utc_isoformat: Parse date generated by datetime.isoformat()

  • fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given

static convert_from_format(value: str, format: str) str

Converts a datetime with the given format.

static convert_from_format_midnight(value: str, format: str) str

Converts a date with the given format and adds time 00:00:00 to it.

static convert_fuzzy(value) str
static from_epoch_millis(tstamp: str, tzone='UTC') datetime.datetime

Returns ISO formatted datetime from given epoch timestamp with milliseconds. It ignores the milliseconds, converts it into normal timestamp and processes it.

static from_timestamp(tstamp: int, tzone='UTC') str

Returns ISO formatted datetime from given timestamp. You can give timezone for given timestamp, UTC by default.

static from_windows_nt(tstamp: int) str

Converts the Windows NT / LDAP / Active Directory format to ISO format.

The format is: 100 nanoseconds (10^-7s) since 1601-01-01. UTC is assumed.

Parameters

tstamp – Time in LDAP format as integer or string. Will be converted if necessary.

Returns

Converted ISO format string

static generate_datetime_now() str
static is_valid(value: str, sanitize: bool = False) bool
midnight = datetime.time(0, 0)
static parse_utc_isoformat(value: str, return_datetime: bool = False) Union[datetime.datetime, str]

Parse format generated by datetime.isoformat() method with UTC timezone. It is much faster than universal dateutil parser. Can be used for parsing DateTime fields which are already parsed.

Returns a string with ISO format. If return_datetime is True, the return value is a datetime.datetime object.

static sanitize(value: str) Optional[str]
class intelmq.lib.harmonization.FQDN

Bases: intelmq.lib.harmonization.String

Fully qualified domain name type.

All valid lowercase domains are accepted, no IP addresses or URLs. Trailing dot is not allowed.

To prevent values like ‘10.0.0.1:8080’ (#1235), we check for the non-existence of ‘:’.

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[str]
static to_ip(value: str) Optional[str]
class intelmq.lib.harmonization.Float

Bases: intelmq.lib.harmonization.GenericType

Float type. Without sanitation only python float/integer/long is accepted. Boolean is explicitly denied.

Sanitation accepts strings and everything float() accepts.

static is_valid(value: float, sanitize: bool = False) bool
static sanitize(value: float) Optional[float]
class intelmq.lib.harmonization.GenericType

Bases: object

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value) Optional[str]
class intelmq.lib.harmonization.IPAddress

Bases: intelmq.lib.harmonization.String

Type for IP addresses, all families. Uses the ipaddress module.

Sanitation accepts integers, strings and objects of ipaddress.IPv4Address and ipaddress.IPv6Address.

Valid values are only strings. 0.0.0.0 is explicitly not allowed.

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: Union[int, str]) Optional[str]
static to_int(value: str) Optional[int]
static to_reverse(ip_addr: str) str
static version(value: str) int
class intelmq.lib.harmonization.IPNetwork

Bases: intelmq.lib.harmonization.String

Type for IP networks, all families. Uses the ipaddress module.

Sanitation accepts strings and objects of ipaddress.IPv4Network and ipaddress.IPv6Network. If host bits in strings are set, they will be ignored (e.g 127.0.0.1/32).

Valid values are only strings.

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[str]
static version(value: str) int
class intelmq.lib.harmonization.Integer

Bases: intelmq.lib.harmonization.GenericType

Integer type. Without sanitation only python integer/long is accepted. Bool is explicitly denied.

Sanitation accepts strings and everything int() accepts.

static is_valid(value: int, sanitize: bool = False) bool
static sanitize(value: int) Optional[int]
class intelmq.lib.harmonization.JSON

Bases: intelmq.lib.harmonization.String

JSON type.

Sanitation accepts any valid JSON objects.

Valid values are only unicode strings with JSON objects.

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[str]
class intelmq.lib.harmonization.JSONDict

Bases: intelmq.lib.harmonization.JSON

JSONDict type.

Sanitation accepts pythons dictionaries and JSON strings.

Valid values are only unicode strings with JSON dictionaries.

static is_valid(value: str, sanitize: bool = False) bool
static is_valid_subitem(value: str) bool
static sanitize(value: str) Optional[str]
static sanitize_subitem(value: str) str
class intelmq.lib.harmonization.LowercaseString

Bases: intelmq.lib.harmonization.String

Like string, but only allows lower case characters.

Sanitation lowers all characters.

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[bool]
class intelmq.lib.harmonization.Registry

Bases: intelmq.lib.harmonization.UppercaseString

Registry type. Derived from UppercaseString.

Only valid values: AFRINIC, APNIC, ARIN, LACNIC, RIPE. RIPE-NCC and RIPENCC are normalized to RIPE.

ENUM = ['AFRINIC', 'APNIC', 'ARIN', 'LACNIC', 'RIPE']
static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) str
class intelmq.lib.harmonization.String

Bases: intelmq.lib.harmonization.GenericType

Any non-empty string without leading or trailing whitespace.

static is_valid(value: str, sanitize: bool = False) bool
class intelmq.lib.harmonization.TLP

Bases: intelmq.lib.harmonization.UppercaseString

TLP level type. Derived from UppercaseString.

Only valid values: WHITE, GREEN, AMBER, RED.

Accepted for sanitation are different cases and the prefix ‘tlp:’.

enum = ['WHITE', 'GREEN', 'AMBER', 'RED']
static is_valid(value: str, sanitize: bool = False) bool
prefix_pattern = re.compile('^(TLP:?)?\\s*', re.IGNORECASE)
static sanitize(value: str) Optional[str]
class intelmq.lib.harmonization.URL

Bases: intelmq.lib.harmonization.String

URI type. Local and remote.

Sanitation converts hxxp and hxxps to http and https. For local URIs (file) a missing host is replaced by localhost.

Valid values must have the host (network location part).

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[str]
static to_domain_name(url: str) Optional[str]
static to_ip(url: str) Optional[str]
class intelmq.lib.harmonization.UppercaseString

Bases: intelmq.lib.harmonization.String

Like string, but only allows upper case characters.

Sanitation uppers all characters.

static is_valid(value: str, sanitize: bool = False) bool
static sanitize(value: str) Optional[str]

intelmq.lib.message module

Messages are the information packages in pipelines.

Use MessageFactory to get a Message object (types Report and Event).

class intelmq.lib.message.Event(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None)

Bases: intelmq.lib.message.Message

__init__(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None) None
Parameters
  • message – Give a report and feed.name, feed.url and time.observation will be used to construct the Event if given. If it’s another type, the value is given to dict’s init

  • auto – unused here

  • harmonization – Harmonization definition to use

class intelmq.lib.message.Message(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None)

Bases: dict

add(key: str, value: str, sanitize: bool = True, overwrite: Optional[bool] = None, ignore: Sequence = (), raise_failure: bool = True) Optional[bool]

Add a value for the key (after sanitation).

Parameters
  • key – Key as defined in the harmonization

  • value – A valid value as defined in the harmonization If the value is None or in _IGNORED_VALUES the value will be ignored. If the value is ignored, the key exists and overwrite is True, the key is deleted.

  • sanitize – Sanitation of harmonization type will be called before validation (default: True)

  • overwrite – Overwrite an existing value if it already exists (default: None) If True, overwrite an existing value If False, do not overwrite an existing value If None, raise intelmq.exceptions.KeyExists for an existing value

  • raise_failure – If a intelmq.lib.exceptions.InvalidValue should be raised for invalid values (default: True). If false, the return parameter will be False in case of invalid values.

Returns

  • True if the value has been added.

  • False if the value is invalid and raise_failure is False or the value existed

    and has not been overwritten.

  • None if the value has been ignored.

Raises
change(key: str, value: str, sanitize: bool = True)
copy() a shallow copy of D
deep_copy()
finditems(keyword: str)
get(key, default=None)

Return the value for key if key is in the dictionary, else default.

hash(*, filter_keys: Iterable = frozenset({}), filter_type: str = 'blacklist')

Return a SHA256 hash of the message as a hexadecimal string. The hash is computed over almost all key/value pairs. Depending on filter_type parameter (blacklist or whitelist), the keys defined in filter_keys_list parameter will be considered as the keys to ignore or the only ones to consider. If given, the filter_keys_list parameter should be a set.

‘time.observation’ will always be ignored.

is_valid(key: str, value: str, sanitize: bool = True) bool

Checks if a value is valid for the key (after sanitation).

Parameters
  • key – Key of the field

  • value – Value of the field

  • sanitize – Sanitation of harmonization type will be called before validation (default: True)

Returns

True if the value is valid, otherwise False

Raises

intelmq.lib.exceptions.InvalidKey – if given key is invalid.

serialize()
set_default_value(value: Optional[Any] = None)

Sets a default value for items.

to_dict(hierarchical: bool = False, with_type: bool = False, jsondict_as_string: bool = False) dict

Returns a copy of self, only based on a dict class.

Parameters
  • hierarchical – Split all keys at a dot and save these subitems in dictionaries.

  • with_type – Add a value named __type containing the message type

  • jsondict_as_string

    If False (default) treat values in JSONDict fields just as normal ones If True, save such fields as JSON-encoded string. This is the old behavior

    before version 1.1.

Returns

A dictionary as copy of itself modified according

to the given parameters

Return type

new_dict

to_json(hierarchical=False, with_type=False, jsondict_as_string=False)
static unserialize(message_string: str)
update([E, ]**F) None.  Update D from dict/iterable E and F.

If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]

class intelmq.lib.message.MessageFactory

Bases: object

unserialize: JSON encoded message to object serialize: object to JSON encoded object

static from_dict(message: dict, harmonization=None, default_type: Optional[str] = None) dict

Takes dictionary Message object, returns instance of correct class.

Parameters
  • message – the message which should be converted to a Message object

  • harmonization – a dictionary holding the used harmonization

  • default_type – If ‘__type’ is not present in message, the given type will be used

See also

MessageFactory.unserialize MessageFactory.serialize

static serialize(message)

Takes instance of message-derived class and makes JSON-encoded Message.

The class is saved in __type attribute.

static unserialize(raw_message: str, harmonization: Optional[dict] = None, default_type: Optional[str] = None) dict

Takes JSON-encoded Message object, returns instance of correct class.

Parameters
  • message – the message which should be converted to a Message object

  • harmonization – a dictionary holding the used harmonization

  • default_type – If ‘__type’ is not present in message, the given type will be used

See also

MessageFactory.from_dict MessageFactory.serialize

class intelmq.lib.message.Report(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None)

Bases: intelmq.lib.message.Message

__init__(message: Union[dict, tuple] = (), auto: bool = False, harmonization: Optional[dict] = None) None
Parameters
  • message – Passed along to Message’s and dict’s init. If this is an instance of the Event class, the resulting Report instance has only the fields which are possible in Report, all others are stripped.

  • auto – if False (default), time.observation is automatically added.

  • harmonization – Harmonization definition to use

copy() a shallow copy of D

intelmq.lib.pipeline module

class intelmq.lib.pipeline.Amqp(logger, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)

Bases: intelmq.lib.pipeline.Pipeline

check_connection()
clear_queue(queue: str) bool
connect()
count_queued_messages(*queues) dict
destination_pipeline_amqp_exchange = ''
destination_pipeline_amqp_virtual_host = '/'
destination_pipeline_db = 2
destination_pipeline_host = '127.0.0.1'
destination_pipeline_password = None
destination_pipeline_socket_timeout = None
destination_pipeline_ssl = False
destination_pipeline_username = None
disconnect()
intelmqctl_rabbitmq_monitoring_url = None
load_configurations(queues_type)
nonempty_queues() set
queue_args = {'x-queue-mode': 'lazy'}
send(message: str, path: str = '_default', path_permissive: bool = False)

In principle we could use AMQP’s exchanges here but that architecture is incompatible to the format of our pipeline configuration.

set_queues(queues: dict, queues_type: str)
Parameters
  • queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)

  • queues_type – “source” or “destination”

The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.

setup_channel()
source_pipeline_amqp_exchange = ''
source_pipeline_amqp_virtual_host = '/'
source_pipeline_db = 2
source_pipeline_host = '127.0.0.1'
source_pipeline_password = None
source_pipeline_socket_timeout = None
source_pipeline_ssl = False
source_pipeline_username = None
class intelmq.lib.pipeline.Pipeline(logger, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)

Bases: object

acknowledge()

Acknowledge/delete the current message from the source queue

exceptions

exceptions.PipelineError: If no message is held

None.

clear_queue(queue)
connect()
disconnect()
has_internal_queues = False
nonempty_queues() set
receive() str
reject_message()
send(message: str, path: str = '_default', path_permissive: bool = False)
set_queues(queues: Optional[str], queues_type: str)
Parameters
  • queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)

  • queues_type – “source” or “destination”

The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.

class intelmq.lib.pipeline.PipelineFactory

Bases: object

static create(logger, broker=None, direction=None, queues=None, pipeline_args=None, load_balance=False, is_multithreaded=False)

direction: “source” or “destination”, optional, needed for queues queues: needs direction to be set, calls set_queues bot: Bot instance

class intelmq.lib.pipeline.Pythonlist(logger, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)

Bases: intelmq.lib.pipeline.Pipeline

This pipeline uses simple lists and is only for testing purpose.

It behaves in most ways like a normal pipeline would do, but works entirely without external modules and programs. Data is saved as it comes (no conversion) and it is not blocking.

_acknowledge()

Removes a message from the internal queue and returns it

_receive() bytes

Receives the last not yet acknowledged message.

Does not block unlike the other pipelines.

_reject_message()

No-op because of the internal queue

clear_queue(queue)

Empties given queue.

connect()
count_queued_messages(*queues) dict

Returns the amount of queued messages over all given queue names.

disconnect()
send(message: str, path: str = '_default', path_permissive: bool = False)

Sends a message to the destination queues

set_queues(queues, queues_type)
Parameters
  • queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)

  • queues_type – “source” or “destination”

The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.

state = {}
class intelmq.lib.pipeline.Redis(logger, pipeline_args: Optional[dict] = None, load_balance=False, is_multithreaded=False)

Bases: intelmq.lib.pipeline.Pipeline

_reject_message()

Rejecting is a no-op as the message is in the internal queue anyway.

clear_queue(queue)

Clears a queue by removing (deleting) the key, which is the same as an empty list in Redis

connect()
count_queued_messages(*queues) dict
destination_pipeline_db = 2
destination_pipeline_host = '127.0.0.1'
destination_pipeline_password = None
disconnect()
has_internal_queues = True
load_configurations(queues_type)
nonempty_queues() set

Returns a list of all currently non-empty queues.

pipe = None
send(message: str, path: str = '_default', path_permissive: bool = False)
set_queues(queues, queues_type)
Parameters
  • queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)

  • queues_type – “source” or “destination”

The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.

source_pipeline_db = 2
source_pipeline_host = '127.0.0.1'
source_pipeline_password = None

intelmq.lib.splitreports module

Support for splitting large raw reports into smaller ones.

The main intention of this module is to help work around limitations in Redis which limits strings to 512MB. Collector bots can use the functions in this module to split the incoming data into smaller pieces which can be sent as separate reports.

Collectors usually don’t really know anything about the data they collect, so the data cannot be reliably split into pieces in all cases. This module can be used for those cases, though, where users know that the data is actually a line-based format and can easily be split into pieces as newline characters. For this to work, some assumptions are made:

  • The data can be split at any newline character

    This would not work, for e.g. a CSV based formats which allow newlines in values as long as they’re within quotes.

  • The lines are much shorter than the maximum chunk size

    Obviously, if this condition does not hold, it may not be possible to split the data into small enough chunks at newline characters.

Other considerations:

  • To accommodate CSV formats, the code can optionally replicate the first line of the file at the start of all chunks.

  • The redis limit applies to the entire IntelMQ report, not just the raw data. The report has some meta data in addition to the raw data and the raw data is encoded as base64 in the report. The maximum chunk size must take this into account, but multiplying the actual limit by 3/4 and subtracting a generous amount for the meta data.

intelmq.lib.splitreports.generate_reports(report_template: intelmq.lib.message.Report, infile: BinaryIO, chunk_size: Optional[int], copy_header_line: bool) Generator[intelmq.lib.message.Report, None, None]

Generate reports from a template and input file, optionally split into chunks.

If chunk_size is None, a single report is generated with the entire contents of infile as the raw data. Otherwise chunk_size should be an integer giving the maximum number of bytes in a chunk. The data read from infile is then split into chunks of this size at newline characters (see read_delimited_chunks). For each of the chunks, this function yields a copy of the report_template with that chunk as the value of the raw attribute.

When splitting the data into chunks, if copy_header_line is true, the first line the file is read before chunking and then prepended to each of the chunks. This is particularly useful when splitting CSV files.

The infile should be a file-like object. generate_reports uses only two methods, readline and read, with readline only called once and only if copy_header_line is true. Both methods should return bytes objects.

Params:

report_template: report used as template for all yielded copies infile: stream to read from chunk_size: maximum size of each chunk copy_header_line: copy the first line of the infile to each chunk

Yields

report – a Report object holding the chunk in the raw field

intelmq.lib.splitreports.read_delimited_chunks(infile: BinaryIO, chunk_size: int) Generator[bytes, None, None]

Yield the contents of infile in chunk_size pieces ending at newlines. The individual pieces, except for the last one, end in newlines and are smaller than chunk_size if possible.

Params:

infile: stream to read from chunk_size: maximum size of each chunk

Yields

chunk – chunk with maximum size of chunk_size if possible

intelmq.lib.splitreports.split_chunks(chunk: bytes, chunk_size: int) List[bytes]

Split a bytestring into chunk_size pieces at ASCII newlines characters.

The return value is a list of bytestring objects. Appending all of them yields a bytestring equal to the input string. All items in the list except the last item end in newline. The items are shorter than chunk_size if possible, but may be longer if the input data has places where the distance between two neline characters is too long.

Note in particular, that the last item may not end in a newline!

Params:

chunk: The string to be split chunk_size: maximum size of each chunk

Returns

List of resulting chunks

Return type

chunks

intelmq.lib.test module

Utilities for testing intelmq bots.

The BotTestCase can be used as base class for unittests on bots. It includes some basic generic tests (logged errors, correct pipeline setup).

class intelmq.lib.test.BotTestCase

Bases: object

Provides common tests and assert methods for bot testing.

assertAnyLoglineEqual(message: str, levelname: str = 'ERROR')

Asserts if any logline matches a specific requirement.

Parameters
  • message – Message text which is compared

  • type – Type of logline which is asserted

Raises

ValueError – if logline message has not been found

assertLogMatches(pattern: str, levelname: str = 'ERROR')

Asserts if any logline matches a specific requirement.

Parameters
  • pattern – Message text which is compared, regular expression.

  • levelname – Log level of the logline which is asserted, upper case.

assertLoglineEqual(line_no: int, message: str, levelname: str = 'ERROR')

Asserts if a logline matches a specific requirement.

Parameters
  • line_no – Number of the logline which is asserted

  • message – Message text which is compared

  • levelname – Log level of logline which is asserted

assertLoglineMatches(line_no: int, pattern: str, levelname: str = 'ERROR')

Asserts if a logline matches a specific requirement.

Parameters
  • line_no – Number of the logline which is asserted

  • pattern – Message text which is compared

  • type – Type of logline which is asserted

assertMessageEqual(queue_pos, expected_msg, compare_raw=True, path='_default')

Asserts that the given expected_message is contained in the generated event with given queue position.

assertNotRegexpMatchesLog(pattern)

Asserts that pattern doesn’t match against log.

assertOutputQueueLen(queue_len=0, path='_default')

Asserts that the output queue has the expected length.

assertRegexpMatchesLog(pattern)

Asserts that pattern matches against log.

bot_types = {'collector': 'CollectorBot', 'expert': 'ExpertBot', 'output': 'OutputBot', 'parser': 'ParserBot'}
get_input_internal_queue()

Returns the internal input queue of this bot which can be filled with fixture data in setUp()

get_input_queue()

Returns the input queue of this bot which can be filled with fixture data in setUp()

get_mocked_logger(logger)
get_output_queue(path='_default')

Getter for items in the output queues of this bot. Use in TestCase scenarios If there is multiple queues in named queue group, we return all the items chained.

harmonization = {'event': {'classification.identifier': {'description': 'The lowercase identifier defines the actual software or service (e.g. ``heartbleed`` or ``ntp_version``) or standardized malware name (e.g. ``zeus``). Note that you MAY overwrite this field during processing for your individual setup. This field is not standardized across IntelMQ setups/users.', 'type': 'String'}, 'classification.taxonomy': {'description': 'We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The European CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check `ENISA taxonomies <http://www.enisa.europa.eu/activities/cert/support/incident-management/browsable/incident-handling-process/incident-taxonomy/existing-taxonomies>`_.', 'length': 100, 'type': 'ClassificationTaxonomy'}, 'classification.type': {'description': 'The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid *type explosion*, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.', 'type': 'ClassificationType'}, 'comment': {'description': 'Free text commentary about the abuse event inserted by an analyst.', 'type': 'String'}, 'destination.abuse_contact': {'description': 'Abuse contact for destination address. A comma separated list.', 'type': 'LowercaseString'}, 'destination.account': {'description': 'An account name or email address, which has been identified to relate to the destination of an abuse event.', 'type': 'String'}, 'destination.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'destination.as_name': {'description': 'The autonomous system name to which the connection headed.', 'type': 'String'}, 'destination.asn': {'description': 'The autonomous system number to which the connection headed.', 'type': 'ASN'}, 'destination.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'destination.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the destination IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'destination.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'destination.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'destination.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'destination.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'destination.ip': {'description': 'The IP which is the target of the observed connections.', 'type': 'IPAddress'}, 'destination.local_hostname': {'description': 'Some sources report a internal hostname within a NAT related to the name configured for a compromized system', 'type': 'String'}, 'destination.local_ip': {'description': 'Some sources report a internal (NATed) IP address related a compromized system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'destination.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'destination.port': {'description': 'The port to which the connection headed.', 'type': 'Integer'}, 'destination.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'destination.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.tor_node': {'description': 'If the destination IP was a known tor node.', 'type': 'Boolean'}, 'destination.url': {'description': 'A URL denotes on IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'destination.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'event_description.target': {'description': 'Some sources denominate the target (organization) of a an attack.', 'type': 'String'}, 'event_description.text': {'description': 'A free-form textual description of an abuse event.', 'type': 'String'}, 'event_description.url': {'description': 'A description URL is a link to a further description of the the abuse event in question.', 'type': 'URL'}, 'event_hash': {'description': 'Computed event hash with specific keys and values that identify a unique event. At present, the hash should default to using the SHA1 function. Please note that for an event hash to be able to match more than one event (deduplication) the receiver of an event should calculate it based on a minimal set of keys and values present in the event. Using for example the observation time in the calculation will most likely render the checksum useless for deduplication purposes.', 'length': 40, 'regex': '^[A-F0-9./]+$', 'type': 'UppercaseString'}, 'extra': {'description': 'All anecdotal information, which cannot be parsed into the data harmonization elements. E.g. os.name, os.version, etc.  **Note**: this is only intended for mapping any fields which can not map naturally into the data harmonization. It is not intended for extending the data harmonization with your own fields.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'malware.hash.md5': {'description': 'A string depicting an MD5 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha1': {'description': 'A string depicting a SHA1 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha256': {'description': 'A string depicting a SHA256 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.name': {'description': 'The malware name in lower case.', 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'malware.version': {'description': 'A version string for an identified artifact generation, e.g. a crime-ware kit.', 'regex': '^[ -~]+$', 'type': 'String'}, 'misp.attribute_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID of an attribute.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}$', 'type': 'LowercaseString'}, 'misp.event_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[0-9a-z]{12}$', 'type': 'LowercaseString'}, 'output': {'description': 'Event data converted into foreign format, intended to be exported by output plugin.', 'type': 'JSON'}, 'protocol.application': {'description': 'e.g. vnc, ssh, sip, irc, http or smtp.', 'length': 100, 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'protocol.transport': {'description': 'e.g. tcp, udp, icmp.', 'iregex': '^(ip|icmp|igmp|ggp|ipencap|st2|tcp|cbt|egp|igp|bbn-rcc|nvp(-ii)?|pup|argus|emcon|xnet|chaos|udp|mux|dcn|hmp|prm|xns-idp|trunk-1|trunk-2|leaf-1|leaf-2|rdp|irtp|iso-tp4|netblt|mfe-nsp|merit-inp|sep|3pc|idpr|xtp|ddp|idpr-cmtp|tp\\+\\+|il|ipv6|sdrp|ipv6-route|ipv6-frag|idrp|rsvp|gre|mhrp|bna|esp|ah|i-nlsp|swipe|narp|mobile|tlsp|skip|ipv6-icmp|ipv6-nonxt|ipv6-opts|cftp|sat-expak|kryptolan|rvd|ippc|sat-mon|visa|ipcv|cpnx|cphb|wsn|pvp|br-sat-mon|sun-nd|wb-mon|wb-expak|iso-ip|vmtp|secure-vmtp|vines|ttp|nsfnet-igp|dgp|tcf|eigrp|ospf|sprite-rpc|larp|mtp|ax.25|ipip|micp|scc-sp|etherip|encap|gmtp|ifmp|pnni|pim|aris|scps|qnx|a/n|ipcomp|snp|compaq-peer|ipx-in-ip|vrrp|pgm|l2tp|ddx|iatp|st|srp|uti|smp|sm|ptp|isis|fire|crtp|crdup|sscopmce|iplt|sps|pipe|sctp|fc|divert)$', 'length': 11, 'type': 'LowercaseString'}, 'raw': {'description': 'The original line of the event from encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'screenshot_url': {'description': 'Some source may report URLs related to a an image generated of a resource without any metadata. Or an URL pointing to resource, which has been rendered into a webshot, e.g. a PNG image and the relevant metadata related to its retrieval/generation.', 'type': 'URL'}, 'source.abuse_contact': {'description': 'Abuse contact for source address. A comma separated list.', 'type': 'LowercaseString'}, 'source.account': {'description': 'An account name or email address, which has been identified to relate to the source of an abuse event.', 'type': 'String'}, 'source.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'source.as_name': {'description': 'The autonomous system name from which the connection originated.', 'type': 'String'}, 'source.asn': {'description': 'The autonomous system number from which originated the connection.', 'type': 'ASN'}, 'source.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'source.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the source IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'source.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'source.geolocation.cymru_cc': {'description': 'The country code denoted for the ip by the Team Cymru asn to ip mapping service.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.geoip_cc': {'description': 'MaxMind Country Code (ISO3166-1 alpha-2).', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'source.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'source.ip': {'description': 'The ip observed to initiate the connection', 'type': 'IPAddress'}, 'source.local_hostname': {'description': 'Some sources report a internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'source.local_ip': {'description': 'Some sources report a internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'source.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'source.port': {'description': 'The port from which the connection originated.', 'length': 5, 'type': 'Integer'}, 'source.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'source.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.tor_node': {'description': 'If the source IP was a known tor node.', 'type': 'Boolean'}, 'source.url': {'description': 'A URL denotes an IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'source.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'status': {'description': 'Status of the malicious resource (phishing, dropzone, etc), e.g. online, offline.', 'type': 'String'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}, 'time.source': {'description': 'The time of occurrence of the event as reported the feed (source).', 'type': 'DateTime'}, 'tlp': {'description': 'Traffic Light Protocol level of the event.', 'type': 'TLP'}}, 'report': {'extra': {'description': 'All anecdotal information of the report, which cannot be parsed into the data harmonization elements. E.g. subject of mails, etc. This is data is not automatically propagated to the events.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'raw': {'description': 'The original raw and unparsed data encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}}}
property input_queue

Returns the input queue of this bot which can be filled with fixture data in setUp()

new_event()
new_report(auto=False, examples=False)
prepare_bot(parameters={}, destination_queues=None)

Reconfigures the bot with the changed attributes.

Parameters
  • parameters – optional bot parameters for this run, as dict

  • destination_queues – optional definition of destination queues default: {“_default”: “{}-output”.format(self.bot_id)}

prepare_source_queue()
run_bot(iterations: int = 1, error_on_pipeline: bool = False, prepare=True, parameters={}, allowed_error_count=0, allowed_warning_count=0, stop_bot: bool = True)

Call this method for actually doing a test run for the specified bot.

Parameters
  • iterations – Bot instance will be run the given times, defaults to 1.

  • parameters – passed to prepare_bot

  • allowed_error_count – maximum number allow allowed errors in the logs

  • allowed_warning_count – maximum number allow allowed warnings in the logs

  • bot_stop – If the bot should be stopped/shut down after running it. Set to False, if you are calling this method again afterwards, as the bot shutdown destroys structures (pipeline, etc.)

classmethod setUpClass()

Set default values and save original functions.

set_input_queue(seq)

Setter for the input queue of this bot

tearDown()

Check if the bot did consume all messages.

Executed after every test run.

classmethod tearDownClass()
test_bot_name(*args, **kwargs)

Test if Bot has a valid name. Must be CamelCase and end with CollectorBot etc.

Accept arbitrary arguments in case the test methods get mocked and get some additional arguments. All arguments are ignored.

test_static_bot_check_method(*args, **kwargs)

Check if the bot’s static check() method completes without errors (exceptions). The return value (errors) are not checked.

The arbitrary parameters for this test function are needed because if a mocker mocks the test class, parameters can be added. See for example intelmq.tests.bots.collectors.http.test_collector.

intelmq.lib.upgrades module

© 2020 Sebastian Wagner <wagner@cert.at>

SPDX-License-Identifier: AGPL-3.0-or-later

intelmq.lib.upgrades.v100_dev7_modify_syntax(defaults, runtime, harmonization, dry_run)

Migrate modify bot configuration format

intelmq.lib.upgrades.v110_deprecations(defaults, runtime, harmonization, dry_run)

Checking for deprecated runtime configurations (stomp collector, cymru parser, ripe expert, collector feed parameter)

intelmq.lib.upgrades.v110_shadowserver_feednames(defaults, runtime, harmonization, dry_run)

Replace deprecated Shadowserver feednames

intelmq.lib.upgrades.v111_defaults_process_manager(defaults, runtime, harmonization, dry_run)

Fix typo in proccess_manager parameter

intelmq.lib.upgrades.v112_feodo_tracker_domains(defaults, runtime, harmonization, dry_run)

Search for discontinued feodotracker domains feed

intelmq.lib.upgrades.v112_feodo_tracker_ips(defaults, runtime, harmonization, dry_run)

Fix URL of feodotracker IPs feed in runtime configuration

intelmq.lib.upgrades.v200_defaults_broker(defaults, runtime, harmonization, dry_run)

Inserting *_pipeline_broker and deleting broker into/from defaults configuration

intelmq.lib.upgrades.v200_defaults_ssl_ca_certificate(defaults, runtime, harmonization, dry_run)

Add ssl_ca_certificate to defaults

intelmq.lib.upgrades.v200_defaults_statistics(defaults, runtime, harmonization, dry_run)

Inserting statistics_* parameters into defaults configuration file

intelmq.lib.upgrades.v202_fixes(defaults, runtime, harmonization, dry_run)

Migrate Collector parameter feed to name. RIPE expert set query_ripe_stat_ip with query_ripe_stat_asn as default. Set cymru whois expert overwrite to true.

intelmq.lib.upgrades.v210_deprecations(defaults, runtime, harmonization, dry_run)

Migrating configuration

intelmq.lib.upgrades.v213_deprecations(defaults, runtime, harmonization, dry_run)

migrate attach_unzip to extract_files for mail attachment collector

intelmq.lib.upgrades.v213_feed_changes(defaults, runtime, harmonization, dry_run)

Migrates feed configuration for changed feed parameters.

intelmq.lib.upgrades.v220_azure_collector(defaults, runtime, harmonization, dry_run)

Checking for the Microsoft Azure collector

intelmq.lib.upgrades.v220_configuration(defaults, runtime, harmonization, dry_run)

Migrating configuration

intelmq.lib.upgrades.v220_feed_changes(defaults, runtime, harmonization, dry_run)

Migrates feed configuration for changed feed parameters.

intelmq.lib.upgrades.v221_feed_changes(defaults, runtime, harmonization, dry_run)

Migrates feeds’ configuration for changed/fixed parameters. Deprecation of HP Hosts file feed & parser.

intelmq.lib.upgrades.v222_feed_changes(defaults, runtime, harmonization, dry_run)

Migrate Shadowserver feed name

intelmq.lib.upgrades.v230_csv_parser_parameter_fix(defaults, runtime, harmonization, dry_run)

Fix CSV parser parameter misspelling

intelmq.lib.upgrades.v230_deprecations(defaults, runtime, harmonization, dry_run)

Deprecate malwaredomainlist parser

intelmq.lib.upgrades.v230_feed_changes(defaults, runtime, harmonization, dry_run)

Migrates feeds’ configuration for changed/fixed parameter

intelmq.lib.upgrades.v233_feodotracker_browse(defaults, runtime, harmonization, dry_run)

Migrate Abuse.ch Feodotracker Browser feed parsing parameters

intelmq.lib.upgrades.v300_bots_file_removal(defaults, runtime, harmonization, dry_run)

Remove BOTS file

intelmq.lib.upgrades.v300_defaults_file_removal(defaults, runtime, harmonization, dry_run)

Remove the defaults.conf file

intelmq.lib.upgrades.v300_pipeline_file_removal(defaults, runtime, harmonization, dry_run)

Remove the pipeline.conf file

intelmq.lib.upgrades.v301_deprecations(defaults, runtime, harmonization, dry_run)

Deprecate malwaredomains parser and collector

intelmq.lib.utils module

Common utility functions for intelmq.

decode encode base64_decode base64_encode load_configuration log reverse_readline parse_logline

class intelmq.lib.utils.RewindableFileHandle(f)

Bases: object

Can be used for easy retrieval of last input line to populate raw field during CSV parsing.

intelmq.lib.utils.base64_decode(value: Union[bytes, str]) str
Parameters

value – base64 encoded string

Returns

decoded string

Return type

retval

Notes

Possible bytes - unicode conversions problems are ignored.

intelmq.lib.utils.base64_encode(value: Union[bytes, str]) str
Parameters

value – string to be encoded

Returns

base64 representation of value

Return type

retval

Notes

Possible bytes - unicode conversions problems are ignored.

intelmq.lib.utils.decode(text: Union[bytes, str], encodings: Sequence[str] = ('utf-8',), force: bool = False) str

Decode given string to UTF-8 (default).

Parameters
  • text – if unicode string is given, same object is returned

  • encodings – list/tuple of encodings to use

  • force – Ignore invalid characters

Returns

converted unicode string

Raises

ValueError – if decoding failed

intelmq.lib.utils.encode(text: Union[bytes, str], encodings: Sequence[str] = ('utf-8',), force: bool = False) bytes

Encode given string from UTF-8 (default).

Parameters
  • text – if bytes string is given, same object is returned

  • encodings – list/tuple of encodings to use

  • force – Ignore invalid characters

Returns

converted bytes string

Raises

ValueError – if encoding failed

intelmq.lib.utils.error_message_from_exc(exc: Exception) str
>>> exc = IndexError('This is a test')
>>> error_message_from_exc(exc)
'This is a test'
Parameters

exc

Returns

The error message of exc

Return type

result

intelmq.lib.utils.file_name_from_response(response: requests.models.Response) str

Extract the file name from the Content-Disposition header of the Response object or the URL as fallback

Parameters

response – a Response object retrieved from a call with the requests library

Returns

The file name

Return type

file_name

intelmq.lib.utils.get_global_settings() dict
intelmq.lib.utils.list_all_bots() dict

Compile a dictionary with all bots and their parameters.

Includes * the bots’ names * the description from the docstring * parameters including default values.

For the parameters, parameters of the Bot class are excluded if they have the same value.

intelmq.lib.utils.load_configuration(configuration_filepath: str) dict

Load JSON or YAML configuration file.

Parameters

configuration_filepath – Path to file to load.

Returns

Parsed configuration

Return type

config

Raises

ValueError – if file not found

intelmq.lib.utils.load_parameters(*configs: dict) intelmq.lib.utils.Parameters

Load dictionaries into new Parameters() instance.

Parameters

*configs – Arbitrary number of dictionaries to load.

Returns

class instance with items of configs as attributes

Return type

parameters

intelmq.lib.utils.log(name: str, log_path: Union[str, bool] = '/opt/intelmq/var/log/', log_level: str = 'INFO', stream: Optional[object] = None, syslog: Optional[Union[bool, str, list, tuple]] = None, log_format_stream: str = '%(name)s: %(message)s', logging_level_stream: Optional[str] = None, log_max_size: Optional[int] = 0, log_max_copies: Optional[int] = None)
intelmq.lib.utils.parse_logline(logline: str, regex: str = '^(?P<date>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d+) - (?P<bot_id>([-\\w]+|py\\.warnings))(?P<thread_id>\\.[0-9]+)? - (?P<log_level>[A-Z]+) - (?P<message>.+)$') Union[dict, str]

Parses the given logline string into its components.

Parameters
  • logline – logline to be parsed

  • regex – The regular expression used to parse the line

Returns

dictionary with keys: [‘date’, ‘bot_id’, ‘log_level’, ‘message’]

or string if the line can’t be parsed

Return type

result

See also

LOG_REGEX: Regular expression for default log format of file handler SYSLOG_REGEX: Regular expression for log format of syslog

intelmq.lib.utils.parse_relative(relative_time: str) int

Parse relative time attributes and returns the corresponding minutes.

>>> parse_relative('4 hours')
240
Parameters

relative_time – a string holding a relative time specification

Returns

Minutes

Return type

result

Raises

ValueError – If relative_time is not parseable

See also

TIMESPANS: Defines the conversion of verbal timespans to minutes

intelmq.lib.utils.reverse_readline(filename: str, buf_size=100000) Generator[str, None, None]

Module contents