intelmq.lib package¶
Subpackages¶
- intelmq.lib.mixins package
- Submodules
- intelmq.lib.mixins.cache module
- intelmq.lib.mixins.http module
HttpMixin
HttpMixin.http_get()
HttpMixin.http_header
HttpMixin.http_password
HttpMixin.http_proxy
HttpMixin.http_session()
HttpMixin.http_timeout_max_tries
HttpMixin.http_timeout_sec
HttpMixin.http_user_agent
HttpMixin.http_username
HttpMixin.http_verify_cert
HttpMixin.https_proxy
HttpMixin.setup()
HttpMixin.ssl_client_cert
TimeoutHTTPAdapter
- intelmq.lib.mixins.sql module
- intelmq.lib.mixins.stomp module
StompMixin
StompMixin.auth_by_ssl_client_certificate
StompMixin.heartbeat
StompMixin.password
StompMixin.port
StompMixin.prepare_stomp_connection()
StompMixin.server
StompMixin.ssl_ca_certificate
StompMixin.ssl_client_certificate
StompMixin.ssl_client_certificate_key
StompMixin.stomp_bot_parameters_check()
StompMixin.stomp_bot_runtime_initial_check()
StompMixin.username
- Module contents
CacheMixin
HttpMixin
HttpMixin.http_get()
HttpMixin.http_header
HttpMixin.http_password
HttpMixin.http_proxy
HttpMixin.http_session()
HttpMixin.http_timeout_max_tries
HttpMixin.http_timeout_sec
HttpMixin.http_user_agent
HttpMixin.http_username
HttpMixin.http_verify_cert
HttpMixin.https_proxy
HttpMixin.setup()
HttpMixin.ssl_client_cert
SQLMixin
StompMixin
StompMixin.auth_by_ssl_client_certificate
StompMixin.heartbeat
StompMixin.password
StompMixin.port
StompMixin.prepare_stomp_connection()
StompMixin.server
StompMixin.ssl_ca_certificate
StompMixin.ssl_client_certificate
StompMixin.ssl_client_certificate_key
StompMixin.stomp_bot_parameters_check()
StompMixin.stomp_bot_runtime_initial_check()
StompMixin.username
Submodules¶
intelmq.lib.bot module¶
- The bot library has the base classes for all bots.
Bot: generic base class for all kind of bots
CollectorBot: base class for collectors
ParserBot: base class for parsers
- class intelmq.lib.bot.Bot(bot_id: str, start: bool = False, sighup_event=None, disable_multithreading: bool = None, settings: dict | None = None, source_queue: str | None = None, standalone: bool = False)¶
Bases:
object
Not to be reset when initialized again on reload.
- classmethod _create_argparser()¶
see https://github.com/certtools/intelmq/pull/1524/files#r464606370 why this code is not in the constructor
- _parse_common_parameters()¶
Parses and sanitizes commonly used parameters:
extract_files
- _parse_extract_file_parameter(parameter_name: str = 'extract_files')¶
Parses and sanitizes commonly used parameters:
extract_files
- accuracy: int = 100¶
- acknowledge_message()¶
Acknowledges that the last message has been processed, if any.
For bots without source pipeline (collectors), this is a no-op.
- static check(parameters: dict) List[List[str]] | None ¶
The bot’s own check function can perform individual checks on it’s parameters. init() is not called before, this is a staticmethod which does not require class initialization.
- Parameters:
parameters – Bot’s parameters, defaults and runtime merged together
- Returns:
- None or a list of [log_level, log_message] pairs, both
strings. log_level must be a valid log level.
- Return type:
output
- description: str | None = None¶
- destination_pipeline_broker: str = 'redis'¶
- destination_pipeline_db: int = 2¶
- destination_pipeline_host: str = '127.0.0.1'¶
- destination_pipeline_password: str | None = None¶
- destination_pipeline_port: int = 6379¶
- destination_queues: dict = {}¶
- enabled: bool = True¶
- error_dump_message: bool = True¶
- error_log_exception: bool = True¶
- error_log_message: bool = False¶
- error_max_retries: int = 3¶
- error_procedure: str = 'pass'¶
- error_retry_delay: int = 15¶
- group: str | None = None¶
- property harmonization¶
- http_proxy: str | None = None¶
- http_timeout_max_tries: int = 3¶
- http_timeout_sec: int = 30¶
- http_user_agent: str = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2228.0 Safari/537.36'¶
- http_verify_cert: bool | str = True¶
- https_proxy: str | None = None¶
- init()¶
- instances_threads: int = 0¶
- is_multithreaded: bool = False¶
- load_balance: bool = False¶
- log_processed_messages_count: int = 500¶
- log_processed_messages_seconds: int = 900¶
- logger = None¶
- logging_handler: str = 'file'¶
- logging_level: str = 'INFO'¶
- logging_path: str = '/opt/intelmq/var/log/'¶
- logging_syslog: str = '/dev/log'¶
- module = None¶
- name: str | None = None¶
- new_event(*args, **kwargs)¶
- process_manager: str = 'intelmq'¶
- process_message(*messages: Message | dict)¶
Call the bot’s process method with a prepared source queue. Return value is a dict with the complete pipeline state. Multiple messages can be given as positional argument. The pipeline needs to be configured accordinglit with BotLibSettings, see https://intelmq.readthedocs.io/en/develop/dev/library.html
Access the output queue e.g. with return_value[‘output’]
- rate_limit: int = 0¶
- receive_message() Message ¶
If the bot is reloaded when waiting for an incoming message, the received message will be rejected to the pipeline in the first place to get to a clean state. Then, after reloading, the message will be retrieved again.
- classmethod run(parsed_args=None)¶
- run_mode: str = 'continuous'¶
- send_message(*messages, path: str = '_default', auto_add=None, path_permissive: bool = False)¶
- Parameters:
messages – Instances of intelmq.lib.message.Message class
auto_add – ignored
path_permissive – If true, do not raise an error if the path is not configured
- set_request_parameters()¶
- shutdown()¶
- source_pipeline_broker: str = 'redis'¶
- source_pipeline_db: int = 2¶
- source_pipeline_host: str = '127.0.0.1'¶
- source_pipeline_password: str | None = None¶
- source_pipeline_port: int = 6379¶
- source_queue: str | None = None¶
- ssl_ca_certificate: str | None = None¶
- start(starting: bool = True, error_on_pipeline: bool = True, error_on_message: bool = False, source_pipeline: str | None = None, destination_pipeline: str | None = None)¶
- statistics_database: int = 3¶
- statistics_host: str = '127.0.0.1'¶
- statistics_password: str | None = None¶
- statistics_port: int = 6379¶
- stop(exitcode: int = 1)¶
- class intelmq.lib.bot.CollectorBot(*args, **kwargs)¶
Bases:
Bot
Base class for collectors.
Does some sanity checks on message sending.
- accuracy: int = 100¶
- bottype = 'Collector'¶
- code: str | None = None¶
- documentation: str | None = None¶
- name: str | None = None¶
- new_report()¶
- provider: str | None = None¶
- send_message(*messages, path: str = '_default', auto_add: bool = True)¶
” :param messages: Instances of intelmq.lib.message.Message class :param path: Named queue the message will be send to :param auto_add: Add some default report fields form parameters
- class intelmq.lib.bot.ExpertBot(*args, **kwargs)¶
Bases:
Bot
Base class for expert bots.
- bottype = 'Expert'¶
- class intelmq.lib.bot.OutputBot(*args, **kwargs)¶
Bases:
Bot
Base class for outputs.
- bottype = 'Output'¶
- export_event(event: Event, return_type: type | None = None) str | dict ¶
- exports an event according to the following parameters:
message_hierarchical
message_with_type
message_jsondict_as_string
single_key
keep_raw_field
- Parameters:
return_type – Ensure that the returned value is of the given type. Optional. For example: str If the resulting value is not an instance of this type, the given object is called with the value as parameter E.g. str(retval)
- class intelmq.lib.bot.ParserBot(*args, **kwargs)¶
Bases:
Bot
- _get_io_and_save_line_ending(raw: str) StringIO ¶
Prepare StringIO and save the original line ending
The line ending is saved in self._line_ending. The default value is rn, the same as default used by csv module
- bottype = 'Parser'¶
- default_fields: dict | None = {}¶
- parse(report: Report)¶
A generator yielding the single elements of the data.
Comments, headers etc. can be processed here. Data needed by self.parse_line can be saved in self.tempdata (list).
Default parser yields stripped lines. Override for your use or use an existing parser, e.g.:
parse = ParserBot.parse_csv
- You should do that for recovering lines too.
recover_line = ParserBot.recover_line_csv
- parse_csv_dict(report: Report)¶
A basic CSV Dictionary parser. The resulting lines are dictionaries with the column names as keys.
- parse_line(line: Any, report: Report)¶
A generator which can yield one or more messages contained in line.
Report has the full message, thus you can access some metadata. Override for your use.
- process()¶
- recover_line(line: str | None = None) str ¶
Reverse of “parse” for single lines.
Recovers a fully functional report with only the problematic line by concatenating all strings in “self.tempdata” with “line” with LF newlines. Works fine for most text files.
- Parameters:
line (Optional[str], optional) – The currently process line which should be transferred into it’s original appearance. As fallback, “self._current_line” is used if available (depending on self.parse). The default is None.
- Raises:
ValueError – If neither the parameter “line” nor the member “self._current_line” is available.
- Returns:
- str
The reconstructed raw data.
- recover_line_csv(line: list | None = None) str ¶
Recover csv line, respecting saved line ending.
- Parameter:
line: Optional line as list. If absent, the current line is used as string.
- recover_line_csv_dict(line: dict | str | None = None) str ¶
Converts dictionaries to csv. self.csv_fieldnames must be list of fields. Respect saved line ending.
- recover_line_json(line: dict) str ¶
Reverse of parse for JSON pulses.
Recovers a fully functional report with only the problematic pulse. Using a string as input here is not possible, as the input may span over multiple lines. Output is not identical to the input, but has the same content.
- Parameters:
dict. (The line as) –
- Returns:
The JSON-encoded line as string.
- Return type:
str
- recover_line_json_stream(line: str | None = None) str ¶
recover_line for JSON streams (one JSON element per line, no outer structure), just returns the current line, unparsed.
- Parameters:
line – The line itself as dict, if available, falls back to original current line
- Returns:
unparsed JSON line.
- Return type:
str
intelmq.lib.bot_debugger module¶
Utilities for debugging intelmq bots.
BotDebugger is called via intelmqctl. It starts a live running bot instance, leverages logging to DEBUG level and permits even a non-skilled programmer who may find themselves puzzled with Python nuances and server deployment twists to see what’s happening in the bot and where’s the error.
- Depending on the subcommand received, the class either
starts the bot as is (default)
processes single message, either injected or from default pipeline (process subcommand)
reads the message from input pipeline or send a message to output pipeline (message subcommand)
- class intelmq.lib.bot_debugger.BotDebugger(runtime_configuration, bot_id, run_subcommand=None, console_type=None, message_kind=None, dryrun=None, msg=None, show=None, loglevel=None)¶
Bases:
object
- EXAMPLE = '\nThe message may look like:\n \'{"source.network": "178.72.192.0/18", "time.observation": "2017-05-12T05:23:06+00:00"}\' '¶
- arg2msg(msg)¶
- instance = None¶
- leverageLogger(level)¶
- load_configuration() dict ¶
Load JSON or YAML configuration file.
- Parameters:
configuration_filepath – Path to file to load.
- Returns:
Parsed configuration
- Return type:
config
- Raises:
ValueError – if file not found
- static load_configuration_patch(configuration_filepath: str, *args, **kwargs) dict ¶
Mock function for utils.load_configuration which ensures the logging level parameter is set to the value we want. If Runtime configuration is detected, the logging_level parameter is - inserted in all bot’s parameters. bot_id is not accessible here, hence we add it everywhere - inserted in the global parameters (ex-defaults). Maybe not everything is necessary, but we can make sure the logging_level is just everywhere where it might be relevant, also in the future.
- logging_level = None¶
- messageWizzard(msg)¶
- output = []¶
- outputappend(msg)¶
- static pprint(msg) str ¶
We can’t use standard pprint as JSON standard asks for double quotes.
- run() str ¶
intelmq.lib.cache module¶
Cache is a set with information already seen by the system. This provides a way, for example, to remove duplicated events and reports in system or cache some results from experts like Cymru Whois. It’s possible to define a TTL value in each information inserted in cache. This TTL means how much time the system will keep an information in the cache.
intelmq.lib.datatypes module¶
- class intelmq.lib.datatypes.BotType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
str
,Enum
- COLLECTOR = 'Collector'¶
- EXPERT = 'Expert'¶
- OUTPUT = 'Output'¶
- PARSER = 'Parser'¶
- _generate_next_value_(start, count, last_values)¶
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_values: the list of values assigned
- toJson()¶
- class intelmq.lib.datatypes.LogLevel(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
Enum
- CRITICAL = 4¶
- DEBUG = 0¶
- ERROR = 3¶
- INFO = 1¶
- WARNING = 2¶
- class intelmq.lib.datatypes.ReturnType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)¶
Bases:
str
,Enum
- JSON = 'Json'¶
- PYTHON = 'Python'¶
- TEXT = 'Text'¶
- _generate_next_value_(start, count, last_values)¶
Generate the next value when not given.
name: the name of the member start: the initial start value or None count: the number of existing members last_values: the list of values assigned
- toJson()¶
- class intelmq.lib.datatypes.TimeFormat(value: str | None = None)¶
Bases:
str
Pydantic style Field Type class for bot parameter time_format. Used for validation.
- parse_datetime(value: str, return_datetime: bool = False) datetime | str ¶
This function uses the selected conversion function to parse the datetime value.
- Parameters:
value – external datetime string
return_datetime – whether to return string or datetime object
- Returns:
parsed datetime or string
- static validate(value: str) [Callable, Optional[str]] ¶
This function validates the time_format parameter value.
- Parameters:
value – bot parameter for datetime conversion
- Returns:
correct time conversion function and the format string
intelmq.lib.exceptions module¶
IntelMQ Exception Class
- exception intelmq.lib.exceptions.ConfigurationError(config: str, argument: str)¶
Bases:
IntelMQException
- exception intelmq.lib.exceptions.IntelMQException(message)¶
Bases:
Exception
- exception intelmq.lib.exceptions.IntelMQHarmonizationException(message)¶
Bases:
IntelMQException
- exception intelmq.lib.exceptions.InvalidArgument(argument: Any, got: Any = None, expected=None, docs: str = None)¶
Bases:
IntelMQException
- exception intelmq.lib.exceptions.InvalidKey(key: str, additional_text: str | None = None)¶
Bases:
IntelMQHarmonizationException
,KeyError
- exception intelmq.lib.exceptions.InvalidValue(key: str, value: str, reason: Any = None, object: bytes = None)¶
- exception intelmq.lib.exceptions.KeyExists(key: str)¶
- exception intelmq.lib.exceptions.KeyNotExists(key: str)¶
- exception intelmq.lib.exceptions.MissingDependencyError(dependency: str, version: str | None = None, installed: str | None = None, additional_text: str | None = None)¶
Bases:
IntelMQException
A missing dependency was detected. Log instructions on installation.
- __init__(dependency: str, version: str | None = None, installed: str | None = None, additional_text: str | None = None)¶
- Parameters:
dependency (str) – The dependency name.
version (Optional[str], optional) – The required version. The default is None.
installed (Optional[str], optional) – The currently installed version. Requires ‘version’ to be given The default is None.
additional_text (Optional[str], optional) – Arbitrary additional text to show. The default is None.
- Returns:
with prepared text
- Return type:
- exception intelmq.lib.exceptions.PipelineError(argument: str | Exception)¶
Bases:
IntelMQException
intelmq.lib.harmonization module¶
The following types are implemented with sanitize() and is_valid() functions:
Base64
Boolean
ClassificationTaxonomy
ClassificationType
DateTime
FQDN
Float
Accuracy
GenericType
IPAddress
IPNetwork
Integer
JSON
JSONDict
LowercaseString
Registry
String
URL
ASN
UppercaseString
TLP
- class intelmq.lib.harmonization.ASN¶
Bases:
Integer
ASN type. Derived from Integer with forbidden values.
Only valid are: 0 < asn <= 4294967295 See https://en.wikipedia.org/wiki/Autonomous_system_(Internet) > The first and last ASNs of the original 16-bit integers, namely 0 and > 65,535, and the last ASN of the 32-bit numbers, namely 4,294,967,295 are > reserved and should not be used by operators.
- static check_asn(value: int) bool ¶
- static is_valid(value: int, sanitize: bool = False) bool ¶
- static sanitize(value: int) int | None ¶
- class intelmq.lib.harmonization.Accuracy¶
Bases:
Float
Accuracy type. A Float between 0 and 100.
- static is_valid(value: float, sanitize: bool = False) bool ¶
- static sanitize(value: float) float | None ¶
- class intelmq.lib.harmonization.Base64¶
Bases:
String
Base64 type. Always gives unicode strings.
Sanitation encodes to base64 and accepts binary and unicode strings.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str | None ¶
- class intelmq.lib.harmonization.Boolean¶
Bases:
GenericType
Boolean type. Without sanitation only python bool is accepted.
Sanitation accepts string ‘true’ and ‘false’ and integers 0 and 1.
- static is_valid(value: bool, sanitize: bool = False) bool ¶
- static sanitize(value: bool) bool | None ¶
- class intelmq.lib.harmonization.ClassificationTaxonomy¶
Bases:
String
classification.taxonomy type.
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/
- These old values are automatically mapped to the new ones:
‘abusive content’ -> ‘abusive-content’ ‘information gathering’ -> ‘information-gathering’ ‘intrusion attempts’ -> ‘intrusion-attempts’ ‘malicious code’ -> ‘malicious-code’
- Allowed values are:
abusive-content
availability
fraud
information-content-security
information-gathering
intrusion-attempts
intrusions
malicious-code
other
test
vulnerable
- allowed_values = ['abusive-content', 'availability', 'fraud', 'information-content-security', 'information-gathering', 'intrusion-attempts', 'intrusions', 'malicious-code', 'other', 'test', 'vulnerable']¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str | None ¶
- class intelmq.lib.harmonization.ClassificationType¶
Bases:
String
classification.type type.
The mapping follows Reference Security Incident Taxonomy Working Group – RSIT WG https://github.com/enisaeu/Reference-Security-Incident-Taxonomy-Task-Force/ with extensions.
- These old values are automatically mapped to the new ones:
‘botnet drone’ -> ‘infected-system’ ‘ids alert’ -> ‘ids-alert’ ‘c&c’ -> ‘c2-server’ ‘c2server’ -> ‘c2-server’ ‘infected system’ -> ‘infected-system’ ‘malware configuration’ -> ‘malware-configuration’ ‘Unauthorised-information-access’ -> ‘unauthorised-information-access’ ‘leak’ -> ‘data-leak’ ‘vulnerable client’ -> ‘vulnerable-system’ ‘vulnerable service’ -> ‘vulnerable-system’ ‘ransomware’ -> ‘infected-system’ ‘unknown’ -> ‘undetermined’
- These values changed their taxonomy:
- ‘malware’: In terms of the taxonomy ‘malicious-code’ they can be either ‘infected-system’ or ‘malware-distribution’
but in terms of malware actually, it is now taxonomy ‘other’
- Allowed values are:
application-compromise
blacklist
brute-force
burglary
c2-server
copyright
data-leak
data-loss
ddos
ddos-amplifier
dga-domain
dos
exploit
harmful-speech
ids-alert
infected-system
information-disclosure
malware
malware-configuration
malware-distribution
masquerade
misconfiguration
other
outage
phishing
potentially-unwanted-accessible
privileged-account-compromise
proxy
sabotage
scanner
sniffing
social-engineering
spam
system-compromise
test
tor
unauthorised-information-access
unauthorised-information-modification
unauthorized-use-of-resources
undetermined
unprivileged-account-compromise
violence
vulnerable-system
weak-crypto
- allowed_values = ('application-compromise', 'blacklist', 'brute-force', 'burglary', 'c2-server', 'copyright', 'data-leak', 'data-loss', 'ddos', 'ddos-amplifier', 'dga-domain', 'dos', 'exploit', 'harmful-speech', 'ids-alert', 'infected-system', 'information-disclosure', 'malware', 'malware-configuration', 'malware-distribution', 'masquerade', 'misconfiguration', 'other', 'outage', 'phishing', 'potentially-unwanted-accessible', 'privileged-account-compromise', 'proxy', 'sabotage', 'scanner', 'sniffing', 'social-engineering', 'spam', 'system-compromise', 'test', 'tor', 'unauthorised-information-access', 'unauthorised-information-modification', 'unauthorized-use-of-resources', 'undetermined', 'unprivileged-account-compromise', 'violence', 'vulnerable-system', 'weak-crypto')¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str | None ¶
- class intelmq.lib.harmonization.DateTime¶
Bases:
String
Date and time type for timestamps.
Valid values are timestamps with time zone and in the format ‘%Y-%m-%dT%H:%M:%S+00:00’. Invalid are missing times and missing timezone information (UTC). Microseconds are also allowed.
Sanitation normalizes the timezone to UTC, which is the only allowed timezone.
The following additional conversions are available with the convert function:
timestamp
windows_nt: From Windows NT / AD / LDAP
epoch_millis: From Milliseconds since Epoch
from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’
from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’
utc_isoformat: Parse date generated by datetime.isoformat()
fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given
- TIME_CONVERSIONS = {'epoch_millis': <function DateTime.from_epoch_millis>, 'from_format': <function DateTime.from_format>, 'from_format_midnight': <function DateTime.from_format_midnight>, 'fuzzy': <function DateTime.from_fuzzy>, 'timestamp': <function DateTime.from_timestamp>, 'utc_isoformat': <function DateTime.from_isoformat>, 'windows_nt': <function DateTime.from_windows_nt>, None: <function DateTime.from_fuzzy>}¶
- static convert(value, format='fuzzy') str ¶
Converts date time strings according to the given format. If the timezone is not given or clear, the local time zone is assumed!
timestamp
windows_nt: From Windows NT / AD / LDAP
epoch_millis: From Milliseconds since Epoch
from_format: From a given format, eg. ‘from_format|%H %M %S %m %d %Y %Z’
from_format_midnight: Date from a given format and assume midnight, e.g. ‘from_format_midnight|%d-%m-%Y’
utc_isoformat: Parse date generated by datetime.isoformat()
fuzzy (or None): Use dateutils’ fuzzy parser, default if no specific parser is given
- static convert_from_format(value: str, format: str) str ¶
This function is replaced by ‘from_format’ function. The original name is kept for backwards compatibility and will be removed in version 4.0.
- static convert_from_format_midnight(value: str, format: str) str ¶
This function is replaced by ‘from_format_midnight’ function. The original name is kept for backwards compatibility and will be removed in version 4.0.
- static convert_fuzzy(value) str ¶
This function is replaced by ‘from_fuzzy’ function. The original name is kept for backwards compatibility and will be removed in version 4.0.
- static from_epoch_millis(value: int | str, return_datetime: bool = False) datetime | str ¶
Returns ISO formatted datetime from given epoch timestamp with milliseconds. It ignores the milliseconds, converts it into normal timestamp and processes it.
- static from_format(value: str, format: str, return_datetime: bool = False) datetime | str ¶
Converts a datetime with the given format.
- static from_format_midnight(value: str, format: str, return_datetime: bool = False) datetime | str ¶
Converts a date with the given format and adds time 00:00:00 to it.
- static from_fuzzy(value, return_datetime: bool = False) datetime | str ¶
- static from_isoformat(value: str, return_datetime: bool = False) datetime | str ¶
Parses datetime string in ISO format. Naive datetime strings (without timezone) are assumed to be in UTC. It is much faster than universal dateutil parser. Can be used for parsing DateTime fields which are already parsed.
Returns a string with ISO format. If return_datetime is True, the return value is a datetime.datetime object.
- static from_timestamp(value: int | float | str, return_datetime: bool = False) datetime | str ¶
Returns ISO formatted datetime from given timestamp.
- static from_windows_nt(value: int | str, return_datetime: bool = False) datetime | str ¶
Converts the Windows NT / LDAP / Active Directory format to ISO format.
The format is: 100 nanoseconds (10^-7s) since 1601-01-01. UTC is assumed.
- Parameters:
value – Time in LDAP format as integer or string. Will be converted if necessary.
return_datetime – Whether to return datetime object or just string.
- Returns:
Converted ISO format string
See also
- static generate_datetime_now() str ¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- midnight = datetime.time(0, 0)¶
- static parse_utc_isoformat(value: str, return_datetime: bool = False) datetime | str ¶
This function is replaced by ‘from_isoformat’ function. The original name is kept for backwards compatibility and will be removed in version 4.0.
- static sanitize(value: str) str | None ¶
- class intelmq.lib.harmonization.FQDN¶
Bases:
String
Fully qualified domain name type.
All valid lowercase domains are accepted, no IP addresses or URLs. Trailing dot is not allowed.
To prevent values like ‘10.0.0.1:8080’ (#1235), we check for the non-existence of ‘:’.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str | None ¶
- static to_ip(value: str) str | None ¶
- class intelmq.lib.harmonization.Float¶
Bases:
GenericType
Float type. Without sanitation only python float/integer/long is accepted. Boolean is explicitly denied.
Sanitation accepts strings and everything float() accepts.
- static is_valid(value: float, sanitize: bool = False) bool ¶
- static sanitize(value: float) float | None ¶
- class intelmq.lib.harmonization.GenericType¶
Bases:
object
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value) str | None ¶
- class intelmq.lib.harmonization.IPAddress¶
Bases:
String
Type for IP addresses, all families. Uses the ipaddress module.
Sanitation accepts integers, strings and objects of ipaddress.IPv4Address and ipaddress.IPv6Address.
Valid values are only strings. 0.0.0.0 is explicitly not allowed.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: int | str) str | None ¶
- static to_int(value: str) int | None ¶
- static to_reverse(ip_addr: str) str ¶
- static version(value: str) int ¶
- class intelmq.lib.harmonization.IPNetwork¶
Bases:
String
Type for IP networks, all families. Uses the ipaddress module.
Sanitation accepts strings and objects of ipaddress.IPv4Network and ipaddress.IPv6Network. If host bits in strings are set, they will be ignored (e.g 127.0.0.1/32).
Valid values are only strings.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str | None ¶
- static version(value: str) int ¶
- class intelmq.lib.harmonization.Integer¶
Bases:
GenericType
Integer type. Without sanitation only python integer/long is accepted. Bool is explicitly denied.
Sanitation accepts strings and everything int() accepts.
- static is_valid(value: int, sanitize: bool = False) bool ¶
- static sanitize(value: int) int | None ¶
- class intelmq.lib.harmonization.JSON¶
Bases:
String
JSON type.
Sanitation accepts any valid JSON objects.
Valid values are only unicode strings with JSON objects.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str | None ¶
- class intelmq.lib.harmonization.JSONDict¶
Bases:
JSON
JSONDict type.
Sanitation accepts pythons dictionaries and JSON strings.
Valid values are only unicode strings with JSON dictionaries.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static is_valid_subitem(value: str) bool ¶
- static sanitize(value: str) str | None ¶
- static sanitize_subitem(value: str) str ¶
- class intelmq.lib.harmonization.LowercaseString¶
Bases:
String
Like string, but only allows lower case characters.
Sanitation lowers all characters.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) bool | None ¶
- class intelmq.lib.harmonization.Registry¶
Bases:
UppercaseString
Registry type. Derived from UppercaseString.
Only valid values: AFRINIC, APNIC, ARIN, LACNIC, RIPE. RIPE-NCC and RIPENCC are normalized to RIPE.
- ENUM = ['AFRINIC', 'APNIC', 'ARIN', 'LACNIC', 'RIPE']¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str ¶
- class intelmq.lib.harmonization.String¶
Bases:
GenericType
Any non-empty string without leading or trailing whitespace.
- static is_valid(value: str, sanitize: bool = False) bool ¶
- class intelmq.lib.harmonization.TLP¶
Bases:
UppercaseString
TLP level type. Derived from UppercaseString.
Only valid values: WHITE, GREEN, AMBER, RED.
Accepted for sanitation are different cases and the prefix ‘tlp:’.
- enum = ['WHITE', 'GREEN', 'AMBER', 'RED']¶
- static is_valid(value: str, sanitize: bool = False) bool ¶
- prefix_pattern = re.compile('^(TLP:?)?\\s*')¶
- static sanitize(value: str) str | None ¶
- class intelmq.lib.harmonization.URL¶
Bases:
String
URI type. Local and remote.
Sanitation converts hxxp and hxxps to http and https. For local URIs (file) a missing host is replaced by localhost.
Valid values must have the host (network location part).
- static is_valid(value: str, sanitize: bool = False) bool ¶
- static sanitize(value: str) str | None ¶
- static to_domain_name(url: str) str | None ¶
- static to_ip(url: str) str | None ¶
intelmq.lib.message module¶
Messages are the information packages in pipelines.
Use MessageFactory to get a Message object (types Report and Event).
- class intelmq.lib.message.Event(message: dict | tuple = (), auto: bool = False, harmonization: dict | None = None)¶
Bases:
Message
- __init__(message: dict | tuple = (), auto: bool = False, harmonization: dict | None = None) None ¶
- Parameters:
message – Give a report and feed.name, feed.url and time.observation will be used to construct the Event if given. If it’s another type, the value is given to dict’s init
auto – unused here
harmonization – Harmonization definition to use
- class intelmq.lib.message.Message(message: dict | tuple = (), auto: bool = False, harmonization: dict = None)¶
Bases:
dict
- add(key: str, value: str, sanitize: bool = True, overwrite: bool | None = None, ignore: Sequence = (), raise_failure: bool = True) bool | None ¶
Add a value for the key (after sanitation).
- Parameters:
key – Key as defined in the harmonization
value – A valid value as defined in the harmonization If the value is None or in _IGNORED_VALUES the value will be ignored. If the value is ignored, the key exists and overwrite is True, the key is deleted.
sanitize – Sanitation of harmonization type will be called before validation (default: True)
overwrite – Overwrite an existing value if it already exists (default: None) If True, overwrite an existing value If False, do not overwrite an existing value If None, raise intelmq.exceptions.KeyExists for an existing value
raise_failure – If a intelmq.lib.exceptions.InvalidValue should be raised for invalid values (default: True). If false, the return parameter will be False in case of invalid values.
- Returns:
True if the value has been added.
- False if the value is invalid and raise_failure is False or the value existed
and has not been overwritten.
None if the value has been ignored.
- Raises:
intelmq.lib.exceptions.KeyExists – If key exists and won’t be overwritten explicitly.
intelmq.lib.exceptions.InvalidKey – if key is invalid.
intelmq.lib.exceptions.InvalidArgument – if ignore is not list or tuple.
intelmq.lib.exceptions.InvalidValue – If value is not valid for the given key and raise_failure is True.
- change(key: str, value: str, sanitize: bool = True)¶
- copy() a shallow copy of D ¶
- deep_copy()¶
- finditems(keyword: str)¶
- get(key, default=None)¶
Return the value for key if key is in the dictionary, else default.
- hash(*, filter_keys: Iterable = frozenset({}), filter_type: str = 'blacklist')¶
Return a SHA256 hash of the message as a hexadecimal string. The hash is computed over almost all key/value pairs. Depending on filter_type parameter (blacklist or whitelist), the keys defined in filter_keys_list parameter will be considered as the keys to ignore or the only ones to consider. If given, the filter_keys_list parameter should be a set.
‘time.observation’ will always be ignored.
- is_valid(key: str, value: str, sanitize: bool = True) bool ¶
Checks if a value is valid for the key (after sanitation).
- Parameters:
key – Key of the field
value – Value of the field
sanitize – Sanitation of harmonization type will be called before validation (default: True)
- Returns:
True if the value is valid, otherwise False
- Raises:
intelmq.lib.exceptions.InvalidKey – if given key is invalid.
- serialize()¶
- set_default_value(value: Any = None)¶
Sets a default value for items.
- to_dict(hierarchical: bool = False, with_type: bool = False, jsondict_as_string: bool = False) dict ¶
Returns a copy of self, only based on a dict class.
- Parameters:
hierarchical – Split all keys at a dot and save these subitems in dictionaries.
with_type – Add a value named __type containing the message type
jsondict_as_string – If False (default) treat values in JSONDict fields just as normal ones If True, save such fields as JSON-encoded string. This is the old behavior before version 1.1.
- Returns:
- A dictionary as copy of itself modified according
to the given parameters
- Return type:
new_dict
- to_json(hierarchical=False, with_type=False, jsondict_as_string=False)¶
- static unserialize(message_string: str)¶
- update([E, ]**F) None. Update D from dict/iterable E and F. ¶
If E is present and has a .keys() method, then does: for k in E: D[k] = E[k] If E is present and lacks a .keys() method, then does: for k, v in E: D[k] = v In either case, this is followed by: for k in F: D[k] = F[k]
- class intelmq.lib.message.MessageFactory¶
Bases:
object
unserialize: JSON encoded message to object serialize: object to JSON encoded object
- static from_dict(message: dict, harmonization=None, default_type: str | None = None) dict ¶
Takes dictionary Message object, returns instance of correct class.
- Parameters:
message – the message which should be converted to a Message object
harmonization – a dictionary holding the used harmonization
default_type – If ‘__type’ is not present in message, the given type will be used
See also
MessageFactory.unserialize MessageFactory.serialize
- static serialize(message)¶
Takes instance of message-derived class and makes JSON-encoded Message.
The class is saved in __type attribute.
- static unserialize(raw_message: str, harmonization: dict = None, default_type: str | None = None) dict ¶
Takes JSON-encoded Message object, returns instance of correct class.
- Parameters:
message – the message which should be converted to a Message object
harmonization – a dictionary holding the used harmonization
default_type – If ‘__type’ is not present in message, the given type will be used
See also
MessageFactory.from_dict MessageFactory.serialize
- class intelmq.lib.message.Report(message: dict | tuple = (), auto: bool = False, harmonization: dict | None = None)¶
Bases:
Message
- __init__(message: dict | tuple = (), auto: bool = False, harmonization: dict | None = None) None ¶
- Parameters:
message – Passed along to Message’s and dict’s init. If this is an instance of the Event class, the resulting Report instance has only the fields which are possible in Report, all others are stripped.
auto – if False (default), time.observation is automatically added.
harmonization – Harmonization definition to use
- copy() a shallow copy of D ¶
intelmq.lib.pipeline module¶
Algorithm¶
[Receive] B RPOP LPUSH source_queue -> internal_queue [Send] LPUSH message -> destination_queue [Acknowledge] RPOP message <- internal_queue
- class intelmq.lib.pipeline.Amqp(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶
Bases:
Pipeline
- check_connection()¶
- clear_queue(queue: str) bool ¶
- connect()¶
- count_queued_messages(*queues) dict ¶
- destination_pipeline_amqp_exchange = ''¶
- destination_pipeline_amqp_virtual_host = '/'¶
- destination_pipeline_db = 2¶
- destination_pipeline_host = '127.0.0.1'¶
- destination_pipeline_password = None¶
- destination_pipeline_socket_timeout = None¶
- destination_pipeline_ssl = False¶
- destination_pipeline_username = None¶
- disconnect()¶
- intelmqctl_rabbitmq_monitoring_url = None¶
- load_configurations(queues_type)¶
- nonempty_queues() set ¶
- queue_args = {'x-queue-mode': 'lazy'}¶
- send(message: str, path: str = '_default', path_permissive: bool = False)¶
In principle we could use AMQP’s exchanges here but that architecture is incompatible to the format of our pipeline configuration.
- set_queues(queues: dict, queues_type: str)¶
- Parameters:
queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
- setup_channel()¶
- source_pipeline_amqp_exchange = ''¶
- source_pipeline_amqp_virtual_host = '/'¶
- source_pipeline_db = 2¶
- source_pipeline_host = '127.0.0.1'¶
- source_pipeline_password = None¶
- source_pipeline_socket_timeout = None¶
- source_pipeline_ssl = False¶
- source_pipeline_username = None¶
- class intelmq.lib.pipeline.Pipeline(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶
Bases:
object
- acknowledge()¶
Acknowledge/delete the current message from the source queue
Parameters:
- Raises:
exceptions – exceptions.PipelineError: If no message is held
- Returns:
None
- clear_queue(queue)¶
- connect()¶
- disconnect()¶
- has_internal_queues = False¶
- nonempty_queues() set ¶
- receive() str ¶
- reject_message()¶
- send(message: str, path: str = '_default', path_permissive: bool = False)¶
- set_queues(queues: str | None, queues_type: str)¶
- Parameters:
queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
- class intelmq.lib.pipeline.PipelineFactory¶
Bases:
object
- static create(logger, broker=None, direction=None, queues=None, pipeline_args: dict | None = None, load_balance=False, is_multithreaded=False)¶
direction: “source” or “destination”, optional, needed for queues queues: needs direction to be set, calls set_queues bot: Bot instance
- class intelmq.lib.pipeline.Pythonlist(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶
Bases:
Pipeline
This pipeline uses simple lists and is only for testing purpose.
It behaves in most ways like a normal pipeline would do, including all encoding and decoding steps, but works entirely without external modules and programs. Data is saved as it comes (no conversion) and it is not blocking.
- _acknowledge()¶
Removes a message from the internal queue and returns it
- _receive() bytes ¶
Receives the last not yet acknowledged message.
Does not block unlike the other pipelines.
- _reject_message()¶
No-op because of the internal queue
- clear_all_queues()¶
Empties all queues / state
- clear_queue(queue)¶
Empties given queue.
- connect()¶
- count_queued_messages(*queues) dict ¶
Returns the amount of queued messages over all given queue names.
- disconnect()¶
- send(message: str, path: str = '_default', path_permissive: bool = False)¶
Sends a message to the destination queues
- set_queues(queues, queues_type)¶
- Parameters:
queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
- state: Dict[str, list] = {}¶
- class intelmq.lib.pipeline.Redis(logger, pipeline_args: dict = None, load_balance=False, is_multithreaded=False)¶
Bases:
Pipeline
- _reject_message()¶
Rejecting is a no-op as the message is in the internal queue anyway.
- clear_queue(queue)¶
Clears a queue by removing (deleting) the key, which is the same as an empty list in Redis
- connect()¶
- count_queued_messages(*queues) dict ¶
- destination_pipeline_db = 2¶
- destination_pipeline_host = '127.0.0.1'¶
- destination_pipeline_password = None¶
- disconnect()¶
- has_internal_queues = True¶
- load_configurations(queues_type)¶
- nonempty_queues() set ¶
Returns a list of all currently non-empty queues.
- pipe = None¶
- send(message: str, path: str = '_default', path_permissive: bool = False)¶
- set_queues(queues, queues_type)¶
- Parameters:
queues – For source queue, it’s just string. For destination queue, it can be one of the following: None or list or dict (of strings or lists, one of the key should be ‘_default’)
queues_type – “source” or “destination”
The method assures self.destination_queues are in the form of dict of lists. It doesn’t assure there is a ‘_default’ key.
- source_pipeline_db = 2¶
- source_pipeline_host = '127.0.0.1'¶
- source_pipeline_password = None¶
intelmq.lib.processmanager module¶
- class intelmq.lib.processmanager.IntelMQProcessManager(*args, **kwargs)¶
Bases:
ProcessManagerInterface
- PIDDIR = '/opt/intelmq/var/run/'¶
- PIDFILE = '/opt/intelmq/var/run/{}.pid'¶
- static _interpret_commandline(pid: int, cmdline: Iterable[str], module: str, bot_id: str) bool | str ¶
Separate function to allow easy testing
Parameters¶
- pidint
Process ID, used for return values (error messages) only.
- cmdlineIterable[str]
The command line of the process.
- modulestr
The module of the bot.
- bot_idstr
The ID of the bot.
Returns¶
- Union[bool, str]
DESCRIPTION.
- bot_reload(bot_id, getstatus=True)¶
- bot_run(bot_id, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
- bot_start(bot_id, getstatus=True)¶
- bot_status(bot_id, *, proc=None)¶
- bot_stop(bot_id, getstatus=True)¶
- class intelmq.lib.processmanager.ProcessManagerInterface(interactive: bool, runtime_configuration: dict, logger: Logger, returntype: ReturnType, quiet: bool)¶
Bases:
object
Defines an interface all processmanager must adhere to
- abstract bot_reload(bot_id: str, getstatus=True)¶
- abstract bot_run(bot_id: str, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
- abstract bot_start(bot_id: str, getstatus=True)¶
- abstract bot_status(bot_id: str) str ¶
- abstract bot_stop(bot_id: str, getstatus=True)¶
- class intelmq.lib.processmanager.SupervisorProcessManager(interactive: bool, runtime_configuration: dict, logger: Logger, returntype: ReturnType, quiet: bool)¶
Bases:
ProcessManagerInterface
- DEFAULT_SOCKET_PATH = '/var/run/supervisor.sock'¶
- class ProcessState¶
Bases:
object
- BACKOFF = 30¶
- EXITED = 100¶
- FATAL = 200¶
- RUNNING = 20¶
- STARTING = 10¶
- STOPPED = 0¶
- STOPPING = 40¶
- UNKNOWN = 1000¶
- static is_running(state: int) bool ¶
- class RpcFaults¶
Bases:
object
- ABNORMAL_TERMINATION = 40¶
- ALREADY_ADDED = 90¶
- ALREADY_STARTED = 60¶
- BAD_ARGUMENTS = 3¶
- BAD_NAME = 10¶
- BAD_SIGNAL = 11¶
- CANT_REREAD = 92¶
- FAILED = 30¶
- INCORRECT_PARAMETERS = 2¶
- NOT_EXECUTABLE = 21¶
- NOT_RUNNING = 70¶
- NO_FILE = 20¶
- SHUTDOWN_STATE = 6¶
- SIGNATURE_UNSUPPORTED = 4¶
- SPAWN_ERROR = 50¶
- STILL_RUNNING = 91¶
- SUCCESS = 80¶
- UNKNOWN_METHOD = 1¶
- SUPERVISOR_GROUP = 'intelmq'¶
- bot_reload(bot_id: str, getstatus: bool = True)¶
- bot_run(bot_id, run_subcommand=None, console_type=None, message_action_kind=None, dryrun=None, msg=None, show_sent=None, loglevel=None)¶
- bot_start(bot_id: str, getstatus: bool = True)¶
- bot_status(bot_id: str) str ¶
- bot_stop(bot_id: str, getstatus: bool = True)¶
- intelmq.lib.processmanager.process_managers()¶
Create a list of processmanagers in this class that are implementing the ProcessManagerInterface Return a dict with a short identifier of the processmanager as key and the classname as value: {‘intelmq’: intelmq.lib.processmanager.IntelMQProcessManager, ‘supervisor’: intelmq.lib.processmanager.SupervisorProcessManager}
intelmq.lib.splitreports module¶
Support for splitting large raw reports into smaller ones.
The main intention of this module is to help work around limitations in Redis which limits strings to 512MB. Collector bots can use the functions in this module to split the incoming data into smaller pieces which can be sent as separate reports.
Collectors usually don’t really know anything about the data they collect, so the data cannot be reliably split into pieces in all cases. This module can be used for those cases, though, where users know that the data is actually a line-based format and can easily be split into pieces as newline characters. For this to work, some assumptions are made:
The data can be split at any newline character
This would not work, for e.g. a CSV based formats which allow newlines in values as long as they’re within quotes.
The lines are much shorter than the maximum chunk size
Obviously, if this condition does not hold, it may not be possible to split the data into small enough chunks at newline characters.
Other considerations:
To accommodate CSV formats, the code can optionally replicate the first line of the file at the start of all chunks.
The redis limit applies to the entire IntelMQ report, not just the raw data. The report has some meta data in addition to the raw data and the raw data is encoded as base64 in the report. The maximum chunk size must take this into account, but multiplying the actual limit by 3/4 and subtracting a generous amount for the meta data.
- intelmq.lib.splitreports.generate_reports(report_template: Report, infile: BinaryIO, chunk_size: int | None, copy_header_line: bool) Generator[Report, None, None] ¶
Generate reports from a template and input file, optionally split into chunks.
If chunk_size is None, a single report is generated with the entire contents of infile as the raw data. Otherwise chunk_size should be an integer giving the maximum number of bytes in a chunk. The data read from infile is then split into chunks of this size at newline characters (see read_delimited_chunks). For each of the chunks, this function yields a copy of the report_template with that chunk as the value of the raw attribute.
When splitting the data into chunks, if copy_header_line is true, the first line the file is read before chunking and then prepended to each of the chunks. This is particularly useful when splitting CSV files.
The infile should be a file-like object. generate_reports uses only two methods, readline and read, with readline only called once and only if copy_header_line is true. Both methods should return bytes objects.
- Params:
report_template: report used as template for all yielded copies infile: stream to read from chunk_size: maximum size of each chunk copy_header_line: copy the first line of the infile to each chunk
- Yields:
report – a Report object holding the chunk in the raw field
- intelmq.lib.splitreports.read_delimited_chunks(infile: BinaryIO, chunk_size: int) Generator[bytes, None, None] ¶
Yield the contents of infile in chunk_size pieces ending at newlines. The individual pieces, except for the last one, end in newlines and are smaller than chunk_size if possible.
- Params:
infile: stream to read from chunk_size: maximum size of each chunk
- Yields:
chunk – chunk with maximum size of chunk_size if possible
- intelmq.lib.splitreports.split_chunks(chunk: bytes, chunk_size: int) List[bytes] ¶
Split a bytestring into chunk_size pieces at ASCII newlines characters.
The return value is a list of bytestring objects. Appending all of them yields a bytestring equal to the input string. All items in the list except the last item end in newline. The items are shorter than chunk_size if possible, but may be longer if the input data has places where the distance between two neline characters is too long.
Note in particular, that the last item may not end in a newline!
- Params:
chunk: The string to be split chunk_size: maximum size of each chunk
- Returns:
List of resulting chunks
- Return type:
chunks
intelmq.lib.test module¶
Utilities for testing intelmq bots.
The BotTestCase can be used as base class for unittests on bots. It includes some basic generic tests (logged errors, correct pipeline setup).
- class intelmq.lib.test.BotTestCase¶
Bases:
object
Provides common tests and assert methods for bot testing.
- assertAnyLoglineEqual(message: str, levelname: str = 'ERROR')¶
Asserts if any logline matches a specific requirement.
- Parameters:
message – Message text which is compared
type – Type of logline which is asserted
- Raises:
ValueError – if logline message has not been found
- assertLogMatches(pattern: str, levelname: str = 'ERROR')¶
Asserts if any logline matches a specific requirement.
- Parameters:
pattern – Message text which is compared, regular expression.
levelname – Log level of the logline which is asserted, upper case.
- assertLoglineEqual(line_no: int, message: str, levelname: str = 'ERROR')¶
Asserts if a logline matches a specific requirement.
- Parameters:
line_no – Number of the logline which is asserted
message – Message text which is compared
levelname – Log level of logline which is asserted
- assertLoglineMatches(line_no: int, pattern: str, levelname: str = 'ERROR')¶
Asserts if a logline matches a specific requirement.
- Parameters:
line_no – Number of the logline which is asserted
pattern – Message text which is compared
type – Type of logline which is asserted
- assertMessageEqual(queue_pos, expected_msg, compare_raw=True, path='_default')¶
Asserts that the given expected_message is contained in the generated event with given queue position.
- assertNotRegexpMatchesLog(pattern)¶
Asserts that pattern doesn’t match against log.
- assertOutputQueueLen(queue_len=0, path='_default')¶
Asserts that the output queue has the expected length.
- assertRegexpMatchesLog(pattern)¶
Asserts that pattern matches against log.
- bot_types = {'collector': 'CollectorBot', 'expert': 'ExpertBot', 'output': 'OutputBot', 'parser': 'ParserBot'}¶
- get_input_internal_queue()¶
Returns the internal input queue of this bot which can be filled with fixture data in setUp()
- get_input_queue()¶
Returns the input queue of this bot which can be filled with fixture data in setUp()
- get_mocked_logger(logger)¶
- get_output_queue(path='_default')¶
Getter for items in the output queues of this bot. Use in TestCase scenarios If there is multiple queues in named queue group, we return all the items chained.
- harmonization = {'event': {'classification.identifier': {'description': 'The lowercase identifier defines the actual software or service (e.g. ``heartbleed`` or ``ntp_version``) or standardized malware name (e.g. ``zeus``). Note that you MAY overwrite this field during processing for your individual setup. This field is not standardized across IntelMQ setups/users.', 'type': 'String'}, 'classification.taxonomy': {'description': 'We recognize the need for the CSIRT teams to apply a static (incident) taxonomy to abuse data. With this goal in mind the type IOC will serve as a basis for this activity. Each value of the dynamic type mapping translates to a an element in the static taxonomy. The European CSIRT teams for example have decided to apply the eCSIRT.net incident classification. The value of the taxonomy key is thus a derivative of the dynamic type above. For more information about check `ENISA taxonomies <http://www.enisa.europa.eu/activities/cert/support/incident-management/browsable/incident-handling-process/incident-taxonomy/existing-taxonomies>`_.', 'length': 100, 'type': 'ClassificationTaxonomy'}, 'classification.type': {'description': 'The abuse type IOC is one of the most crucial pieces of information for any given abuse event. The main idea of dynamic typing is to keep our ontology flexible, since we need to evolve with the evolving threatscape of abuse data. In contrast with the static taxonomy below, the dynamic typing is used to perform business decisions in the abuse handling pipeline. Furthermore, the value data set should be kept as minimal as possible to avoid *type explosion*, which in turn dilutes the business value of the dynamic typing. In general, we normally have two types of abuse type IOC: ones referring to a compromised resource or ones referring to pieces of the criminal infrastructure, such as a command and control servers for example.', 'type': 'ClassificationType'}, 'comment': {'description': 'Free text commentary about the abuse event inserted by an analyst.', 'type': 'String'}, 'destination.abuse_contact': {'description': 'Abuse contact for destination address. A comma separated list.', 'type': 'LowercaseString'}, 'destination.account': {'description': 'An account name or email address, which has been identified to relate to the destination of an abuse event.', 'type': 'String'}, 'destination.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'destination.as_name': {'description': 'The autonomous system name to which the connection headed.', 'type': 'String'}, 'destination.asn': {'description': 'The autonomous system number to which the connection headed.', 'type': 'ASN'}, 'destination.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'destination.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the destination IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'destination.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'destination.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'destination.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'destination.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'destination.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'destination.ip': {'description': 'The IP which is the target of the observed connections.', 'type': 'IPAddress'}, 'destination.local_hostname': {'description': 'Some sources report an internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'destination.local_ip': {'description': 'Some sources report an internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'destination.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'destination.port': {'description': 'The port to which the connection headed.', 'type': 'Integer'}, 'destination.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'destination.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'destination.tor_node': {'description': 'If the destination IP was a known tor node.', 'type': 'Boolean'}, 'destination.url': {'description': 'A URL denotes on IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'destination.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'event_description.target': {'description': 'Some sources denominate the target (organization) of a an attack.', 'type': 'String'}, 'event_description.text': {'description': 'A free-form textual description of an abuse event.', 'type': 'String'}, 'event_description.url': {'description': 'A description URL is a link to a further description of the the abuse event in question.', 'type': 'URL'}, 'event_hash': {'description': 'Computed event hash with specific keys and values that identify a unique event. At present, the hash should default to using the SHA1 function. Please note that for an event hash to be able to match more than one event (deduplication) the receiver of an event should calculate it based on a minimal set of keys and values present in the event. Using for example the observation time in the calculation will most likely render the checksum useless for deduplication purposes.', 'length': 40, 'regex': '^[A-F0-9./]+$', 'type': 'UppercaseString'}, 'extra': {'description': 'All anecdotal information, which cannot be parsed into the data harmonization elements. E.g. os.name, os.version, etc. **Note**: this is only intended for mapping any fields which can not map naturally into the data harmonization. It is not intended for extending the data harmonization with your own fields.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'malware.hash.md5': {'description': 'A string depicting an MD5 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha1': {'description': 'A string depicting a SHA1 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.hash.sha256': {'description': 'A string depicting a SHA256 checksum for a file, be it a malware sample for example.', 'length': 200, 'regex': '^[ -~]+$', 'type': 'String'}, 'malware.name': {'description': 'The malware name in lower case.', 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'malware.version': {'description': 'A version string for an identified artifact generation, e.g. a crime-ware kit.', 'regex': '^[ -~]+$', 'type': 'String'}, 'misp.attribute_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID of an attribute.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{12}$', 'type': 'LowercaseString'}, 'misp.event_uuid': {'description': 'MISP - Malware Information Sharing Platform & Threat Sharing UUID.', 'length': 36, 'regex': '^[a-z0-9]{8}-[a-z0-9]{4}-[a-z0-9]{4}-[a-z0-9]{4}-[0-9a-z]{12}$', 'type': 'LowercaseString'}, 'output': {'description': 'Event data converted into foreign format, intended to be exported by output plugin.', 'type': 'JSON'}, 'protocol.application': {'description': 'e.g. vnc, ssh, sip, irc, http or smtp.', 'length': 100, 'regex': '^[ -~]+$', 'type': 'LowercaseString'}, 'protocol.transport': {'description': 'e.g. tcp, udp, icmp.', 'iregex': '^(ip|icmp|igmp|ggp|ipencap|st2|tcp|cbt|egp|igp|bbn-rcc|nvp(-ii)?|pup|argus|emcon|xnet|chaos|udp|mux|dcn|hmp|prm|xns-idp|trunk-1|trunk-2|leaf-1|leaf-2|rdp|irtp|iso-tp4|netblt|mfe-nsp|merit-inp|sep|3pc|idpr|xtp|ddp|idpr-cmtp|tp\\+\\+|il|ipv6|sdrp|ipv6-route|ipv6-frag|idrp|rsvp|gre|mhrp|bna|esp|ah|i-nlsp|swipe|narp|mobile|tlsp|skip|ipv6-icmp|ipv6-nonxt|ipv6-opts|cftp|sat-expak|kryptolan|rvd|ippc|sat-mon|visa|ipcv|cpnx|cphb|wsn|pvp|br-sat-mon|sun-nd|wb-mon|wb-expak|iso-ip|vmtp|secure-vmtp|vines|ttp|nsfnet-igp|dgp|tcf|eigrp|ospf|sprite-rpc|larp|mtp|ax.25|ipip|micp|scc-sp|etherip|encap|gmtp|ifmp|pnni|pim|aris|scps|qnx|a/n|ipcomp|snp|compaq-peer|ipx-in-ip|vrrp|pgm|l2tp|ddx|iatp|st|srp|uti|smp|sm|ptp|isis|fire|crtp|crdup|sscopmce|iplt|sps|pipe|sctp|fc|divert)$', 'length': 11, 'type': 'LowercaseString'}, 'raw': {'description': 'The original line of the event from encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'screenshot_url': {'description': 'Some source may report URLs related to a an image generated of a resource without any metadata. Or an URL pointing to resource, which has been rendered into a webshot, e.g. a PNG image and the relevant metadata related to its retrieval/generation.', 'type': 'URL'}, 'source.abuse_contact': {'description': 'Abuse contact for source address. A comma separated list.', 'type': 'LowercaseString'}, 'source.account': {'description': 'An account name or email address, which has been identified to relate to the source of an abuse event.', 'type': 'String'}, 'source.allocated': {'description': 'Allocation date corresponding to BGP prefix.', 'type': 'DateTime'}, 'source.as_name': {'description': 'The autonomous system name from which the connection originated.', 'type': 'String'}, 'source.asn': {'description': 'The autonomous system number from which originated the connection.', 'type': 'ASN'}, 'source.domain_suffix': {'description': 'The suffix of the domain from the public suffix list.', 'type': 'FQDN'}, 'source.fqdn': {'description': 'A DNS name related to the host from which the connection originated. DNS allows even binary data in DNS, so we have to allow everything. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.geolocation.cc': {'description': 'Country-Code according to ISO3166-1 alpha-2 for the source IP.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.city': {'description': 'Some geolocation services refer to city-level geolocation.', 'type': 'String'}, 'source.geolocation.country': {'description': 'The country name derived from the ISO3166 country code (assigned to cc field).', 'type': 'String'}, 'source.geolocation.cymru_cc': {'description': 'The country code denoted for the ip by the Team Cymru asn to ip mapping service.', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.geoip_cc': {'description': 'MaxMind Country Code (ISO3166-1 alpha-2).', 'length': 2, 'regex': '^[a-zA-Z0-9]{2}$', 'type': 'UppercaseString'}, 'source.geolocation.latitude': {'description': 'Latitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.longitude': {'description': 'Longitude coordinates derived from a geolocation service, such as MaxMind geoip db.', 'type': 'Float'}, 'source.geolocation.region': {'description': 'Some geolocation services refer to region-level geolocation.', 'type': 'String'}, 'source.geolocation.state': {'description': 'Some geolocation services refer to state-level geolocation.', 'type': 'String'}, 'source.ip': {'description': 'The ip observed to initiate the connection', 'type': 'IPAddress'}, 'source.local_hostname': {'description': 'Some sources report a internal hostname within a NAT related to the name configured for a compromised system', 'type': 'String'}, 'source.local_ip': {'description': 'Some sources report a internal (NATed) IP address related a compromised system. N.B. RFC1918 IPs are OK here.', 'type': 'IPAddress'}, 'source.network': {'description': 'CIDR for an autonomous system. Also known as BGP prefix. If multiple values are possible, select the most specific.', 'type': 'IPNetwork'}, 'source.port': {'description': 'The port from which the connection originated.', 'length': 5, 'type': 'Integer'}, 'source.registry': {'description': 'The IP registry a given ip address is allocated by.', 'length': 7, 'type': 'Registry'}, 'source.reverse_dns': {'description': 'Reverse DNS name acquired through a reverse DNS query on an IP address. N.B. Record types other than PTR records may also appear in the reverse DNS tree. Furthermore, unfortunately, there is no rule prohibiting people from writing anything in a PTR record. Even JavaScript will work. A final point is stripped, string is converted to lower case characters.', 'regex': '^.*[^\\.]$', 'type': 'FQDN'}, 'source.tor_node': {'description': 'If the source IP was a known tor node.', 'type': 'Boolean'}, 'source.url': {'description': 'A URL denotes an IOC, which refers to a malicious resource, whose interpretation is defined by the abuse type. A URL with the abuse type phishing refers to a phishing resource.', 'type': 'URL'}, 'source.urlpath': {'description': 'The path portion of an HTTP or related network request.', 'type': 'String'}, 'status': {'description': 'Status of the malicious resource (phishing, dropzone, etc), e.g. online, offline.', 'type': 'String'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}, 'time.source': {'description': 'The time of occurrence of the event as reported the feed (source).', 'type': 'DateTime'}, 'tlp': {'description': 'Traffic Light Protocol level of the event.', 'type': 'TLP'}}, 'report': {'extra': {'description': 'All anecdotal information of the report, which cannot be parsed into the data harmonization elements. E.g. subject of mails, etc. This is data is not automatically propagated to the events.', 'type': 'JSONDict'}, 'feed.accuracy': {'description': 'A float between 0 and 100 that represents how accurate the data in the feed is', 'type': 'Accuracy'}, 'feed.code': {'description': 'Code name for the feed, e.g. DFGS, HSDAG etc.', 'length': 100, 'type': 'String'}, 'feed.documentation': {'description': 'A URL or hint where to find the documentation of this feed.', 'type': 'String'}, 'feed.name': {'description': 'Name for the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.provider': {'description': 'Name for the provider of the feed, usually found in collector bot configuration.', 'type': 'String'}, 'feed.url': {'description': 'The URL of a given abuse feed, where applicable', 'type': 'URL'}, 'raw': {'description': 'The original raw and unparsed data encoded in base64.', 'type': 'Base64'}, 'rtir_id': {'description': 'Request Tracker Incident Response ticket id.', 'type': 'Integer'}, 'time.observation': {'description': 'The time the collector of the local instance processed (observed) the event.', 'type': 'DateTime'}}}¶
- property input_queue¶
Returns the input queue of this bot which can be filled with fixture data in setUp()
- new_event()¶
- new_report(auto=False, examples=False)¶
- prepare_bot(parameters={}, destination_queues=None, prepare_source_queue: bool = True)¶
Reconfigures the bot with the changed attributes.
- Parameters:
parameters – optional bot parameters for this run, as dict
destination_queues – optional definition of destination queues default: {“_default”: “{}-output”.format(self.bot_id)}
- prepare_source_queue()¶
- run_bot(iterations: int = 1, error_on_pipeline: bool = False, prepare=True, parameters={}, allowed_error_count=0, allowed_warning_count=0, stop_bot: bool = True, expected_internal_queue_size: int = 0)¶
Call this method for actually doing a test run for the specified bot.
- Parameters:
iterations – Bot instance will be run the given times, defaults to 1.
parameters – passed to prepare_bot
allowed_error_count – maximum number allow allowed errors in the logs
allowed_warning_count – maximum number allow allowed warnings in the logs
bot_stop – If the bot should be stopped/shut down after running it. Set to False, if you are calling this method again afterwards, as the bot shutdown destroys structures (pipeline, etc.)
- classmethod setUpClass()¶
Set default values and save original functions.
- set_input_queue(seq)¶
Setter for the input queue of this bot
- tearDown()¶
Check if the bot did consume all messages.
Executed after every test run.
- classmethod tearDownClass()¶
- test_bot_name(*args, **kwargs)¶
Test if Bot has a valid name. Must be CamelCase and end with CollectorBot etc.
Accept arbitrary arguments in case the test methods get mocked and get some additional arguments. All arguments are ignored.
- test_static_bot_check_method(*args, **kwargs)¶
Check if the bot’s static check() method completes without errors (exceptions). The return value (errors) are not checked.
The arbitrary parameters for this test function are needed because if a mocker mocks the test class, parameters can be added. See for example intelmq.tests.bots.collectors.http.test_collector.
intelmq.lib.upgrades module¶
© 2020 Sebastian Wagner <wagner@cert.at>
SPDX-License-Identifier: AGPL-3.0-or-later
- intelmq.lib.upgrades.v100_dev7_modify_syntax(configuration, harmonization, dry_run, **kwargs)¶
Migrate modify bot configuration format
- intelmq.lib.upgrades.v110_deprecations(configuration, harmonization, dry_run, **kwargs)¶
Checking for deprecated runtime configurations (stomp collector, cymru parser, ripe expert, collector feed parameter)
- intelmq.lib.upgrades.v110_shadowserver_feednames(configuration, harmonization, dry_run, **kwargs)¶
Replace deprecated Shadowserver feednames
- intelmq.lib.upgrades.v111_defaults_process_manager(configuration, harmonization, dry_run, **kwargs)¶
Fix typo in proccess_manager parameter
- intelmq.lib.upgrades.v112_feodo_tracker_domains(configuration, harmonization, dry_run, **kwargs)¶
Search for discontinued feodotracker domains feed
- intelmq.lib.upgrades.v112_feodo_tracker_ips(configuration, harmonization, dry_run, **kwargs)¶
Fix URL of feodotracker IPs feed in runtime configuration
- intelmq.lib.upgrades.v200_defaults_broker(configuration, harmonization, dry_run, **kwargs)¶
Inserting *_pipeline_broker and deleting broker into/from defaults configuration
- intelmq.lib.upgrades.v200_defaults_ssl_ca_certificate(configuration, harmonization, dry_run, **kwargs)¶
Add ssl_ca_certificate to defaults
- intelmq.lib.upgrades.v200_defaults_statistics(configuration, harmonization, dry_run, **kwargs)¶
Inserting statistics_* parameters into defaults configuration file
- intelmq.lib.upgrades.v202_fixes(configuration, harmonization, dry_run, **kwargs)¶
Migrate Collector parameter feed to name. RIPE expert set query_ripe_stat_ip with query_ripe_stat_asn as default. Set cymru whois expert overwrite to true.
- intelmq.lib.upgrades.v210_deprecations(configuration, harmonization, dry_run, **kwargs)¶
Migrating configuration
- intelmq.lib.upgrades.v213_deprecations(configuration, harmonization, dry_run, **kwargs)¶
migrate attach_unzip to extract_files for mail attachment collector
- intelmq.lib.upgrades.v213_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feed configuration for changed feed parameters.
- intelmq.lib.upgrades.v220_azure_collector(configuration, harmonization, dry_run, **kwargs)¶
Checking for the Microsoft Azure collector
- intelmq.lib.upgrades.v220_configuration(configuration, harmonization, dry_run, **kwargs)¶
Migrating configuration
- intelmq.lib.upgrades.v220_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feed configuration for changed feed parameters.
- intelmq.lib.upgrades.v221_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feeds’ configuration for changed/fixed parameters. Deprecation of HP Hosts file feed & parser.
- intelmq.lib.upgrades.v222_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrate Shadowserver feed name
- intelmq.lib.upgrades.v230_csv_parser_parameter_fix(configuration, harmonization, dry_run, **kwargs)¶
Fix CSV parser parameter misspelling
- intelmq.lib.upgrades.v230_deprecations(configuration, harmonization, dry_run, **kwargs)¶
Deprecate malwaredomainlist parser
- intelmq.lib.upgrades.v230_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feeds’ configuration for changed/fixed parameter
- intelmq.lib.upgrades.v233_feodotracker_browse(configuration, harmonization, dry_run, **kwargs)¶
Migrate Abuse.ch Feodotracker Browser feed parsing parameters
- intelmq.lib.upgrades.v300_bots_file_removal(configuration, harmonization, dry_run, **kwargs)¶
Remove BOTS file
- intelmq.lib.upgrades.v300_defaults_file_removal(configuration, harmonization, dry_run, **kwargs)¶
Remove the defaults.conf file
- intelmq.lib.upgrades.v300_pipeline_file_removal(configuration, harmonization, dry_run, **kwargs)¶
Remove the pipeline.conf file
- intelmq.lib.upgrades.v301_deprecations(configuration, harmonization, dry_run, **kwargs)¶
Deprecate malwaredomains parser and collector
- intelmq.lib.upgrades.v310_feed_changes(configuration, harmonization, dry_run, **kwargs)¶
Migrates feeds’ configuration for changed/fixed parameter
- intelmq.lib.upgrades.v310_shadowserver_feednames(configuration, harmonization, dry_run, **kwargs)¶
Remove legacy Shadowserver feednames
- intelmq.lib.upgrades.v320_update_turris_greylist_url(configuration, harmonization, dry_run, **kwargs)¶
Updates Turris Greylist feed URL.
intelmq.lib.utils module¶
Common utility functions for intelmq.
decode encode base64_decode base64_encode load_configuration log reverse_readline parse_logline
- class intelmq.lib.utils.RewindableFileHandle(f, condition: ~typing.Callable | None = <function RewindableFileHandle.<lambda>>)¶
Bases:
object
Can be used for easy retrieval of last input line to populate raw field during CSV parsing and handling filtering.
- intelmq.lib.utils.base64_decode(value: bytes | str) str ¶
- Parameters:
value – base64 encoded string
- Returns:
decoded string
- Return type:
retval
Notes
Possible bytes - unicode conversions problems are ignored.
- intelmq.lib.utils.base64_encode(value: bytes | str) str ¶
- Parameters:
value – string to be encoded
- Returns:
base64 representation of value
- Return type:
retval
Notes
Possible bytes - unicode conversions problems are ignored.
- intelmq.lib.utils.decode(text: bytes | str, encodings: Sequence[str] = ('utf-8',), force: bool = False) str ¶
Decode given string to UTF-8 (default).
- Parameters:
text – if unicode string is given, same object is returned
encodings – list/tuple of encodings to use
force – Ignore invalid characters
- Returns:
converted unicode string
- Raises:
ValueError – if decoding failed
- intelmq.lib.utils.encode(text: bytes | str, encodings: Sequence[str] = ('utf-8',), force: bool = False) bytes ¶
Encode given string from UTF-8 (default).
- Parameters:
text – if bytes string is given, same object is returned
encodings – list/tuple of encodings to use
force – Ignore invalid characters
- Returns:
converted bytes string
- Raises:
ValueError – if encoding failed
- intelmq.lib.utils.error_message_from_exc(exc: Exception) str ¶
>>> exc = IndexError('This is a test') >>> error_message_from_exc(exc) 'This is a test'
- Parameters:
exc –
- Returns:
The error message of exc
- Return type:
result
- intelmq.lib.utils.file_name_from_response(response: Response) str ¶
Extract the file name from the Content-Disposition header of the Response object or the URL as fallback
- Parameters:
response – a Response object retrieved from a call with the requests library
- Returns:
The file name
- Return type:
file_name
- intelmq.lib.utils.get_global_settings() dict ¶
- intelmq.lib.utils.list_all_bots() dict ¶
Compile a dictionary with all bots and their parameters.
Includes * the bots’ names * the description from the docstring * parameters including default values.
For the parameters, parameters of the Bot class are excluded if they have the same value.
- intelmq.lib.utils.load_configuration(configuration_filepath: str) dict ¶
Load JSON or YAML configuration file.
- Parameters:
configuration_filepath – Path to file to load.
- Returns:
Parsed configuration
- Return type:
config
- Raises:
ValueError – if file not found
- intelmq.lib.utils.load_parameters(*configs: dict) Parameters ¶
Load dictionaries into new Parameters() instance.
- Parameters:
*configs – Arbitrary number of dictionaries to load.
- Returns:
class instance with items of configs as attributes
- Return type:
parameters
- intelmq.lib.utils.log(name: str, log_path: str | bool = '/opt/intelmq/var/log/', log_level: str = 'INFO', stream: object | None = None, syslog: bool | str | list | tuple = None, log_format_stream: str = '%(name)s: %(message)s', logging_level_stream: str | None = None, log_max_size: int | None = 0, log_max_copies: int | None = None)¶
- intelmq.lib.utils.parse_logline(logline: str, regex: str = '^(?P<date>\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2},\\d+) - (?P<bot_id>([-\\w]+|py\\.warnings))(?P<thread_id>\\.[0-9]+)? - (?P<log_level>[A-Z]+) - (?P<message>.+)$') dict | str ¶
Parses the given logline string into its components.
- Parameters:
logline – logline to be parsed
regex – The regular expression used to parse the line
- Returns:
- dictionary with keys: [‘date’, ‘bot_id’, ‘log_level’, ‘message’]
or string if the line can’t be parsed
- Return type:
result
See also
LOG_REGEX: Regular expression for default log format of file handler SYSLOG_REGEX: Regular expression for log format of syslog
- intelmq.lib.utils.parse_relative(relative_time: str) int ¶
Parse relative time attributes and returns the corresponding minutes.
>>> parse_relative('4 hours') 240
- Parameters:
relative_time – a string holding a relative time specification
- Returns:
Minutes
- Return type:
result
- Raises:
ValueError – If relative_time is not parseable
See also
TIMESPANS: Defines the conversion of verbal timespans to minutes
- intelmq.lib.utils.reverse_readline(filename: str, buf_size=100000) Generator[str, None, None] ¶