API Reference

Core Parsing Functions

Methods for evaluating mathematical equations in strings.

exception mathparse.mathparse.PostfixTokenEvaluationException[source]

Exception to be raised when an expression cannot be evaluated.

mathparse.mathparse.create_unicode_word_boundary_pattern(word: str) str[source]

Create a regex pattern with Unicode-aware word boundaries.

Standard regex b word boundaries don’t work with non-ASCII characters (e.g., Devanagari, Arabic, Hebrew, Chinese, Thai, etc.). This function creates a pattern that works across all Unicode scripts.

Args:

word (str): The word to create a boundary pattern for

Returns:

str: A regex pattern string with proper Unicode boundaries

Examples:
>>> create_unicode_word_boundary_pattern("two")
'(?<![\w])two(?![\w])'
>>> create_unicode_word_boundary_pattern("दो")  # Hindi "two"
'(?:^|(?<=[\s+\-*/^()]))दो(?:$|(?=[\s+\-*/^()]))'
mathparse.mathparse.evaluate_postfix(tokens: list) int | float | str | Decimal[source]

Given a list of evaluatable tokens in postfix format, calculate a solution.

mathparse.mathparse.extract_expression(dirty_string: str, language: str) str[source]

Extract a mathematical expression from a sentence containing extra text.

This function identifies and extracts the mathematical portion from natural language sentences like: “What is 4 + 4?” or “Calculate five plus three”. It works by finding the longest sequence of mathematical symbols and words.

Args:
dirty_string (str): A sentence or phrase containing a mathematical

expression mixed with other text.

language (str): ISO 639-2 language code to identify mathematical

words in the target language.

Returns:

str: The extracted mathematical expression as a string.

Examples:
>>> extract_expression("What is 5 plus 3?", language='ENG')
'5 plus 3'
>>> extract_expression(
        "Please calculate two times seven", language='ENG'
    )
'two times seven'
>>> extract_expression(
        "The result of 10 / 2 should be 5", language=None
    )
'10 / 2'
Note:
  • The function looks for continuous sequences of mathematical terms

  • Non-mathematical words at the beginning and end are stripped

  • The language parameter is required to identify word-based math terms

mathparse.mathparse.find_word_groups(string: str, words: list) list[source]

Find matches for words in the format “3 thousand 6 hundred 2”. The words parameter should be the list of words to check for such as “hundred”.

mathparse.mathparse.is_binary(string: str) bool[source]

Return true if the string is a defined binary operator.

mathparse.mathparse.is_constant(string: str) bool[source]

Return true if the string is a mathematical constant.

mathparse.mathparse.is_float(string: str) bool[source]

Return true if the string is a float.

mathparse.mathparse.is_int(string: str) bool[source]

Return true if string is an integer.

mathparse.mathparse.is_symbol(string: str) bool[source]

Return true if the string is a mathematical symbol.

mathparse.mathparse.is_unary(string: str) bool[source]

Return true if the string is a defined unary mathematical operator function.

mathparse.mathparse.is_word(word: str, language: str) bool[source]

Return true if the word is a math word for the specified language.

mathparse.mathparse.parse(string: str, language: str = None, stopwords: set[str] = None) int | float | str | Decimal[source]

Parse and evaluate a mathematical expression from a string.

This is the main entry point for mathparse. It can handle both numeric expressions (like “2 + 3 * 4”) and word-based expressions in various languages (like “five plus three” in English).

Args:
string (str): The mathematical expression to parse and evaluate.

Can contain numbers, operators, parentheses, constants, and functions. For word-based parsing, must use terms from the specified language.

language (str, optional): ISO 639-2 language code for word-based

parsing. Supported codes: ‘ENG’, ‘FRE’, ‘GER’, ‘GRE’, ‘ITA’, ‘MAR’, ‘RUS’, ‘POR’. If None, only numeric expressions are supported.

stopwords (set[str], optional): A set of words to ignore during

parsing. This can be used to filter out non-mathematical words in expressions.

Returns:
int, float, or str: The result of the mathematical expression.

Returns ‘undefined’ for division by zero. For division operations, returns a Decimal object to maintain precision.

Raises:
InvalidLanguageCodeException:

An unsupported language code was provided.

PostfixTokenEvaluationException:

The expression cannot be evaluated.

Examples:
>>> parse('2 + 3 * 4')
14
>>> parse('five plus three', language='ENG')
8
>>> parse('(seven * nine) + 8 - (45 plus two)', language='ENG')
24
>>> parse('sqrt 16')
4.0
>>> parse('pi * 2')
6.283386
>>> parse('10 / 0')
'undefined'
Note:
  • Follows standard order of operations (PEMDAS)

  • Supports mathematical constants: pi, e

  • Supports unary functions: sqrt, log

  • Each expression must use terms from a single language

  • Division by zero returns ‘undefined’ instead of raising an exception

mathparse.mathparse.preprocess_unary_operators(tokens: list) list[source]

Preprocess tokens to convert unary minus to the ‘neg’ function.

A minus sign is considered unary (negative) if it appears: * At the beginning of the expression * After an opening parenthesis ‘(’ * After a binary operator (+, -, *, /, ^) * After a unary function (sqrt, log, neg)

mathparse.mathparse.replace_word_tokens(string: str, language: str, stopwords: set[str] = None) str[source]

Replace word-based mathematical terms with their symbolic equivalents.

Given a string and an ISO 639-2 language code, return the string with the words replaced with an operational equivalent.

Args:
string (str): The input string containing a mathematical expression

with words.

language (str): ISO 639-2 language code for word-based parsing.

stopwords (set[str], optional): A set of words to ignore during

parsing. This can be used to filter out non-mathematical words in expressions.

Returns:
str: The input string with word-based mathematical terms replaced

with their symbolic equivalents.

mathparse.mathparse.replace_word_tokens_simplified_chinese(string, stopwords: set[str] = None) str[source]

simplified Chinese version: Given a string and an ISO 639-2 language code, return the string with the words replaced with an operational equivalent.

mathparse.mathparse.to_number(val) int | float | str | Decimal[source]

Convert a string to an int or float if possible.

mathparse.mathparse.to_postfix(tokens: list) list[source]

Convert a list of evaluatable tokens to postfix format.

mathparse.mathparse.tokenize(string: str, language: str = None, escape: str = '___') list[source]

Convert a string into a list of mathematical tokens for processing.

Args:

string (str): The input string containing a mathematical expression.

language (str, optional): ISO 639-2 language code for word-based

parsing. If None, only numeric expressions are supported.

escape (str, optional): A string used to temporarily replace spaces

in multi-word phrases during tokenization. Default is ‘___’.

Returns:

list: A list of tokens extracted from the input string.

The main parsing module contains all the core functionality for mathematical expression parsing.

Main Parse Function

mathparse.mathparse.parse(string: str, language: str = None, stopwords: set[str] = None) int | float | str | Decimal[source]

Parse and evaluate a mathematical expression from a string.

This is the main entry point for mathparse. It can handle both numeric expressions (like “2 + 3 * 4”) and word-based expressions in various languages (like “five plus three” in English).

Args:
string (str): The mathematical expression to parse and evaluate.

Can contain numbers, operators, parentheses, constants, and functions. For word-based parsing, must use terms from the specified language.

language (str, optional): ISO 639-2 language code for word-based

parsing. Supported codes: ‘ENG’, ‘FRE’, ‘GER’, ‘GRE’, ‘ITA’, ‘MAR’, ‘RUS’, ‘POR’. If None, only numeric expressions are supported.

stopwords (set[str], optional): A set of words to ignore during

parsing. This can be used to filter out non-mathematical words in expressions.

Returns:
int, float, or str: The result of the mathematical expression.

Returns ‘undefined’ for division by zero. For division operations, returns a Decimal object to maintain precision.

Raises:
InvalidLanguageCodeException:

An unsupported language code was provided.

PostfixTokenEvaluationException:

The expression cannot be evaluated.

Examples:
>>> parse('2 + 3 * 4')
14
>>> parse('five plus three', language='ENG')
8
>>> parse('(seven * nine) + 8 - (45 plus two)', language='ENG')
24
>>> parse('sqrt 16')
4.0
>>> parse('pi * 2')
6.283386
>>> parse('10 / 0')
'undefined'
Note:
  • Follows standard order of operations (PEMDAS)

  • Supports mathematical constants: pi, e

  • Supports unary functions: sqrt, log

  • Each expression must use terms from a single language

  • Division by zero returns ‘undefined’ instead of raising an exception

This is the primary function for parsing mathematical expressions. It accepts both numeric and word-based expressions.

Parameters:

  • string (str): The mathematical expression to parse

  • language (str, optional): ISO 639-2 language code for word-based parsing

Returns:

  • int, float, or str: The result of the mathematical expression, or ‘undefined’ for division by zero

Example:

from mathparse import mathparse

# Numeric expression
result = mathparse.parse('2 + 3 * 4')
# Returns: 14

# Word-based expression
result = mathparse.parse('five times six plus ten', language='ENG')
# Returns: 40

Expression Extraction

mathparse.mathparse.extract_expression(dirty_string: str, language: str) str[source]

Extract a mathematical expression from a sentence containing extra text.

This function identifies and extracts the mathematical portion from natural language sentences like: “What is 4 + 4?” or “Calculate five plus three”. It works by finding the longest sequence of mathematical symbols and words.

Args:
dirty_string (str): A sentence or phrase containing a mathematical

expression mixed with other text.

language (str): ISO 639-2 language code to identify mathematical

words in the target language.

Returns:

str: The extracted mathematical expression as a string.

Examples:
>>> extract_expression("What is 5 plus 3?", language='ENG')
'5 plus 3'
>>> extract_expression(
        "Please calculate two times seven", language='ENG'
    )
'two times seven'
>>> extract_expression(
        "The result of 10 / 2 should be 5", language=None
    )
'10 / 2'
Note:
  • The function looks for continuous sequences of mathematical terms

  • Non-mathematical words at the beginning and end are stripped

  • The language parameter is required to identify word-based math terms

This function extracts mathematical expressions from sentences containing additional text.

Example:

# Extract from a question
expression = mathparse.extract_expression("What is 5 plus 3?", language='ENG')
# Returns: "5 plus 3"

result = mathparse.parse(expression, language='ENG')
# Returns: 8

Tokenization Functions

mathparse.mathparse.tokenize(string: str, language: str = None, escape: str = '___') list[source]

Convert a string into a list of mathematical tokens for processing.

Args:

string (str): The input string containing a mathematical expression.

language (str, optional): ISO 639-2 language code for word-based

parsing. If None, only numeric expressions are supported.

escape (str, optional): A string used to temporarily replace spaces

in multi-word phrases during tokenization. Default is ‘___’.

Returns:

list: A list of tokens extracted from the input string.

Converts a string into mathematical tokens that can be processed.

mathparse.mathparse.replace_word_tokens(string: str, language: str, stopwords: set[str] = None) str[source]

Replace word-based mathematical terms with their symbolic equivalents.

Given a string and an ISO 639-2 language code, return the string with the words replaced with an operational equivalent.

Args:
string (str): The input string containing a mathematical expression

with words.

language (str): ISO 639-2 language code for word-based parsing.

stopwords (set[str], optional): A set of words to ignore during

parsing. This can be used to filter out non-mathematical words in expressions.

Returns:
str: The input string with word-based mathematical terms replaced

with their symbolic equivalents.

Replaces word-based mathematical terms with their symbolic equivalents.

Evaluation Functions

mathparse.mathparse.to_postfix(tokens: list) list[source]

Convert a list of evaluatable tokens to postfix format.

Converts mathematical tokens to postfix notation for evaluation.

mathparse.mathparse.evaluate_postfix(tokens: list) int | float | str | Decimal[source]

Given a list of evaluatable tokens in postfix format, calculate a solution.

Evaluates a postfix expression and returns the result.

Utility Functions

mathparse.mathparse.is_int(string: str) bool[source]

Return true if string is an integer.

mathparse.mathparse.is_float(string: str) bool[source]

Return true if the string is a float.

mathparse.mathparse.is_constant(string: str) bool[source]

Return true if the string is a mathematical constant.

mathparse.mathparse.is_unary(string: str) bool[source]

Return true if the string is a defined unary mathematical operator function.

mathparse.mathparse.is_binary(string: str) bool[source]

Return true if the string is a defined binary operator.

mathparse.mathparse.is_symbol(string: str) bool[source]

Return true if the string is a mathematical symbol.

mathparse.mathparse.is_word(word: str, language: str) bool[source]

Return true if the word is a math word for the specified language.

Language and Word Support

Utility methods for getting math word terms.

exception mathparse.mathwords.InvalidLanguageCodeException[source]

Exception to be raised when a language code is specified that is not a part of the ISO 639-2 standard, or if the specified language is not yet supported by mathparse.

mathparse.mathwords.word_groups_for_language(language_code: str) dict[str, dict[str, str]][source]

Return the math word groups for a language code. The language_code should be an ISO 639-2 language code. https://www.loc.gov/standards/iso639-2/php/code_list.php

mathparse.mathwords.words_for_language(language_code: str) set[str][source]

Return the math words for a language code. The language_code should be an ISO 639-2 language code. https://www.loc.gov/standards/iso639-2/php/code_list.php

This module provides language-specific mathematical terms and utility functions.

Language Functions

mathparse.mathwords.words_for_language(language_code: str) set[str][source]

Return the math words for a language code. The language_code should be an ISO 639-2 language code. https://www.loc.gov/standards/iso639-2/php/code_list.php

Returns all mathematical words for a specific language.

Parameters:

  • language_code (str): ISO 639-2 language code

Returns:

  • list: All mathematical words for the language

Example:

from mathparse.mathwords import words_for_language

english_words = words_for_language('ENG')
# Returns: ['plus', 'minus', 'times', 'one', 'two', ...]
mathparse.mathwords.word_groups_for_language(language_code: str) dict[str, dict[str, str]][source]

Return the math word groups for a language code. The language_code should be an ISO 639-2 language code. https://www.loc.gov/standards/iso639-2/php/code_list.php

Returns organized groups of mathematical words (operators, numbers, scales) for a language.

Parameters:

  • language_code (str): ISO 639-2 language code

Returns:

  • dict: Dictionary containing word groups

Example:

from mathparse.mathwords import word_groups_for_language

groups = word_groups_for_language('ENG')
# Returns: {
#     'binary_operators': {'plus': '+', 'minus': '-', ...},
#     'numbers': {'one': 1, 'two': 2, ...},
#     'scales': {'hundred': 100, 'thousand': 1000, ...}
# }

Constants and Functions in Utils

Mathematical Constants in Utils

mathparse includes common mathematical constants:

mathparse.mathwords.CONSTANTS

Dictionary of mathematical constants available in expressions.

Available constants:

  • pi: 3.141693

  • e: 2.718281

Example:

result = mathparse.parse('pi * 2')
# Returns: 6.283386

Unary Functions in Utils

mathparse.mathwords.UNARY_FUNCTIONS

Dictionary of available unary mathematical functions.

Available functions:

  • sqrt: Square root function

  • log: Base-10 logarithm function

  • neg: Negative (unary minus) function

Example:

result = mathparse.parse('sqrt 16')
# Returns: 4.0

result = mathparse.parse('log 100')
# Returns: 2.0

result = mathparse.parse('negative five plus ten', language='ENG')
# Returns: 5

Binary Operators

mathparse.mathwords.BINARY_OPERATORS

Set of supported binary mathematical operators.

Includes: {'^', '*', '/', '+', '-', '.'}

The decimal point ('.') operator combines integer and fractional parts to create decimal numbers. For example, in the expression 53 . 4, the decimal operator combines 53 and 4 to produce 53.4.

Exceptions

exception mathparse.mathparse.PostfixTokenEvaluationException[source]

Exception to be raised when an expression cannot be evaluated.

Raised when there’s an error evaluating postfix tokens.

exception mathparse.mathwords.InvalidLanguageCodeException[source]

Exception to be raised when a language code is specified that is not a part of the ISO 639-2 standard, or if the specified language is not yet supported by mathparse.

Raised when an invalid or unsupported language code is provided.

Example:

from mathparse import mathparse
from mathparse.mathwords import InvalidLanguageCodeException

try:
    result = mathparse.parse('five plus three', language='INVALID')
except InvalidLanguageCodeException as e:
    print(f"Language error: {e}")

Language Codes

mathparse.mathwords.LANGUAGE_CODES

List of supported ISO 639-2 language codes.

Currently supported: ['CHI', 'DUT', 'ENG', 'ESP', 'FRE', 'GER', 'GRE', 'ITA', 'MAR', 'POR', 'RUS', 'THA', 'UKR']