API Reference¶
Core Parsing Functions¶
Methods for evaluating mathematical equations in strings.
- exception mathparse.mathparse.PostfixTokenEvaluationException[source]
Exception to be raised when an expression cannot be evaluated.
- mathparse.mathparse.create_unicode_word_boundary_pattern(word: str) str [source]
Create a regex pattern with Unicode-aware word boundaries.
Standard regex b word boundaries don’t work with non-ASCII characters (e.g., Devanagari, Arabic, Hebrew, Chinese, Thai, etc.). This function creates a pattern that works across all Unicode scripts.
- Args:
word (str): The word to create a boundary pattern for
- Returns:
str: A regex pattern string with proper Unicode boundaries
- Examples:
>>> create_unicode_word_boundary_pattern("two") '(?<![\w])two(?![\w])' >>> create_unicode_word_boundary_pattern("दो") # Hindi "two" '(?:^|(?<=[\s+\-*/^()]))दो(?:$|(?=[\s+\-*/^()]))'
- mathparse.mathparse.evaluate_postfix(tokens: list) int | float | str | Decimal [source]
Given a list of evaluatable tokens in postfix format, calculate a solution.
- mathparse.mathparse.extract_expression(dirty_string: str, language: str) str [source]
Extract a mathematical expression from a sentence containing extra text.
This function identifies and extracts the mathematical portion from natural language sentences like: “What is 4 + 4?” or “Calculate five plus three”. It works by finding the longest sequence of mathematical symbols and words.
- Args:
- dirty_string (str): A sentence or phrase containing a mathematical
expression mixed with other text.
- language (str): ISO 639-2 language code to identify mathematical
words in the target language.
- Returns:
str: The extracted mathematical expression as a string.
- Examples:
>>> extract_expression("What is 5 plus 3?", language='ENG') '5 plus 3'
>>> extract_expression( "Please calculate two times seven", language='ENG' ) 'two times seven'
>>> extract_expression( "The result of 10 / 2 should be 5", language=None ) '10 / 2'
- Note:
The function looks for continuous sequences of mathematical terms
Non-mathematical words at the beginning and end are stripped
The language parameter is required to identify word-based math terms
- mathparse.mathparse.find_word_groups(string: str, words: list) list [source]
Find matches for words in the format “3 thousand 6 hundred 2”. The words parameter should be the list of words to check for such as “hundred”.
- mathparse.mathparse.is_binary(string: str) bool [source]
Return true if the string is a defined binary operator.
- mathparse.mathparse.is_constant(string: str) bool [source]
Return true if the string is a mathematical constant.
- mathparse.mathparse.is_symbol(string: str) bool [source]
Return true if the string is a mathematical symbol.
- mathparse.mathparse.is_unary(string: str) bool [source]
Return true if the string is a defined unary mathematical operator function.
- mathparse.mathparse.is_word(word: str, language: str) bool [source]
Return true if the word is a math word for the specified language.
- mathparse.mathparse.parse(string: str, language: str = None, stopwords: set[str] = None) int | float | str | Decimal [source]
Parse and evaluate a mathematical expression from a string.
This is the main entry point for mathparse. It can handle both numeric expressions (like “2 + 3 * 4”) and word-based expressions in various languages (like “five plus three” in English).
- Args:
- string (str): The mathematical expression to parse and evaluate.
Can contain numbers, operators, parentheses, constants, and functions. For word-based parsing, must use terms from the specified language.
- language (str, optional): ISO 639-2 language code for word-based
parsing. Supported codes: ‘ENG’, ‘FRE’, ‘GER’, ‘GRE’, ‘ITA’, ‘MAR’, ‘RUS’, ‘POR’. If None, only numeric expressions are supported.
- stopwords (set[str], optional): A set of words to ignore during
parsing. This can be used to filter out non-mathematical words in expressions.
- Returns:
- int, float, or str: The result of the mathematical expression.
Returns ‘undefined’ for division by zero. For division operations, returns a Decimal object to maintain precision.
- Raises:
- InvalidLanguageCodeException:
An unsupported language code was provided.
- PostfixTokenEvaluationException:
The expression cannot be evaluated.
- Examples:
>>> parse('2 + 3 * 4') 14
>>> parse('five plus three', language='ENG') 8
>>> parse('(seven * nine) + 8 - (45 plus two)', language='ENG') 24
>>> parse('sqrt 16') 4.0
>>> parse('pi * 2') 6.283386
>>> parse('10 / 0') 'undefined'
- Note:
Follows standard order of operations (PEMDAS)
Supports mathematical constants: pi, e
Supports unary functions: sqrt, log
Each expression must use terms from a single language
Division by zero returns ‘undefined’ instead of raising an exception
- mathparse.mathparse.preprocess_unary_operators(tokens: list) list [source]
Preprocess tokens to convert unary minus to the ‘neg’ function.
A minus sign is considered unary (negative) if it appears: * At the beginning of the expression * After an opening parenthesis ‘(’ * After a binary operator (+, -, *, /, ^) * After a unary function (sqrt, log, neg)
- mathparse.mathparse.replace_word_tokens(string: str, language: str, stopwords: set[str] = None) str [source]
Replace word-based mathematical terms with their symbolic equivalents.
Given a string and an ISO 639-2 language code, return the string with the words replaced with an operational equivalent.
- Args:
- string (str): The input string containing a mathematical expression
with words.
language (str): ISO 639-2 language code for word-based parsing.
- stopwords (set[str], optional): A set of words to ignore during
parsing. This can be used to filter out non-mathematical words in expressions.
- Returns:
- str: The input string with word-based mathematical terms replaced
with their symbolic equivalents.
- mathparse.mathparse.replace_word_tokens_simplified_chinese(string, stopwords: set[str] = None) str [source]
simplified Chinese version: Given a string and an ISO 639-2 language code, return the string with the words replaced with an operational equivalent.
- mathparse.mathparse.to_number(val) int | float | str | Decimal [source]
Convert a string to an int or float if possible.
- mathparse.mathparse.to_postfix(tokens: list) list [source]
Convert a list of evaluatable tokens to postfix format.
- mathparse.mathparse.tokenize(string: str, language: str = None, escape: str = '___') list [source]
Convert a string into a list of mathematical tokens for processing.
- Args:
string (str): The input string containing a mathematical expression.
- language (str, optional): ISO 639-2 language code for word-based
parsing. If None, only numeric expressions are supported.
- escape (str, optional): A string used to temporarily replace spaces
in multi-word phrases during tokenization. Default is ‘___’.
- Returns:
list: A list of tokens extracted from the input string.
The main parsing module contains all the core functionality for mathematical expression parsing.
Main Parse Function¶
- mathparse.mathparse.parse(string: str, language: str = None, stopwords: set[str] = None) int | float | str | Decimal [source]¶
Parse and evaluate a mathematical expression from a string.
This is the main entry point for mathparse. It can handle both numeric expressions (like “2 + 3 * 4”) and word-based expressions in various languages (like “five plus three” in English).
- Args:
- string (str): The mathematical expression to parse and evaluate.
Can contain numbers, operators, parentheses, constants, and functions. For word-based parsing, must use terms from the specified language.
- language (str, optional): ISO 639-2 language code for word-based
parsing. Supported codes: ‘ENG’, ‘FRE’, ‘GER’, ‘GRE’, ‘ITA’, ‘MAR’, ‘RUS’, ‘POR’. If None, only numeric expressions are supported.
- stopwords (set[str], optional): A set of words to ignore during
parsing. This can be used to filter out non-mathematical words in expressions.
- Returns:
- int, float, or str: The result of the mathematical expression.
Returns ‘undefined’ for division by zero. For division operations, returns a Decimal object to maintain precision.
- Raises:
- InvalidLanguageCodeException:
An unsupported language code was provided.
- PostfixTokenEvaluationException:
The expression cannot be evaluated.
- Examples:
>>> parse('2 + 3 * 4') 14
>>> parse('five plus three', language='ENG') 8
>>> parse('(seven * nine) + 8 - (45 plus two)', language='ENG') 24
>>> parse('sqrt 16') 4.0
>>> parse('pi * 2') 6.283386
>>> parse('10 / 0') 'undefined'
- Note:
Follows standard order of operations (PEMDAS)
Supports mathematical constants: pi, e
Supports unary functions: sqrt, log
Each expression must use terms from a single language
Division by zero returns ‘undefined’ instead of raising an exception
This is the primary function for parsing mathematical expressions. It accepts both numeric and word-based expressions.
Parameters:
string
(str): The mathematical expression to parselanguage
(str, optional): ISO 639-2 language code for word-based parsing
Returns:
int
,float
, orstr
: The result of the mathematical expression, or ‘undefined’ for division by zero
Example:
from mathparse import mathparse
# Numeric expression
result = mathparse.parse('2 + 3 * 4')
# Returns: 14
# Word-based expression
result = mathparse.parse('five times six plus ten', language='ENG')
# Returns: 40
Expression Extraction¶
- mathparse.mathparse.extract_expression(dirty_string: str, language: str) str [source]¶
Extract a mathematical expression from a sentence containing extra text.
This function identifies and extracts the mathematical portion from natural language sentences like: “What is 4 + 4?” or “Calculate five plus three”. It works by finding the longest sequence of mathematical symbols and words.
- Args:
- dirty_string (str): A sentence or phrase containing a mathematical
expression mixed with other text.
- language (str): ISO 639-2 language code to identify mathematical
words in the target language.
- Returns:
str: The extracted mathematical expression as a string.
- Examples:
>>> extract_expression("What is 5 plus 3?", language='ENG') '5 plus 3'
>>> extract_expression( "Please calculate two times seven", language='ENG' ) 'two times seven'
>>> extract_expression( "The result of 10 / 2 should be 5", language=None ) '10 / 2'
- Note:
The function looks for continuous sequences of mathematical terms
Non-mathematical words at the beginning and end are stripped
The language parameter is required to identify word-based math terms
This function extracts mathematical expressions from sentences containing additional text.
Example:
# Extract from a question
expression = mathparse.extract_expression("What is 5 plus 3?", language='ENG')
# Returns: "5 plus 3"
result = mathparse.parse(expression, language='ENG')
# Returns: 8
Tokenization Functions¶
- mathparse.mathparse.tokenize(string: str, language: str = None, escape: str = '___') list [source]¶
Convert a string into a list of mathematical tokens for processing.
- Args:
string (str): The input string containing a mathematical expression.
- language (str, optional): ISO 639-2 language code for word-based
parsing. If None, only numeric expressions are supported.
- escape (str, optional): A string used to temporarily replace spaces
in multi-word phrases during tokenization. Default is ‘___’.
- Returns:
list: A list of tokens extracted from the input string.
Converts a string into mathematical tokens that can be processed.
- mathparse.mathparse.replace_word_tokens(string: str, language: str, stopwords: set[str] = None) str [source]¶
Replace word-based mathematical terms with their symbolic equivalents.
Given a string and an ISO 639-2 language code, return the string with the words replaced with an operational equivalent.
- Args:
- string (str): The input string containing a mathematical expression
with words.
language (str): ISO 639-2 language code for word-based parsing.
- stopwords (set[str], optional): A set of words to ignore during
parsing. This can be used to filter out non-mathematical words in expressions.
- Returns:
- str: The input string with word-based mathematical terms replaced
with their symbolic equivalents.
Replaces word-based mathematical terms with their symbolic equivalents.
Evaluation Functions¶
- mathparse.mathparse.to_postfix(tokens: list) list [source]¶
Convert a list of evaluatable tokens to postfix format.
Converts mathematical tokens to postfix notation for evaluation.
- mathparse.mathparse.evaluate_postfix(tokens: list) int | float | str | Decimal [source]¶
Given a list of evaluatable tokens in postfix format, calculate a solution.
Evaluates a postfix expression and returns the result.
Utility Functions¶
- mathparse.mathparse.is_constant(string: str) bool [source]¶
Return true if the string is a mathematical constant.
- mathparse.mathparse.is_unary(string: str) bool [source]¶
Return true if the string is a defined unary mathematical operator function.
- mathparse.mathparse.is_binary(string: str) bool [source]¶
Return true if the string is a defined binary operator.
Language and Word Support¶
Utility methods for getting math word terms.
- exception mathparse.mathwords.InvalidLanguageCodeException[source]
Exception to be raised when a language code is specified that is not a part of the ISO 639-2 standard, or if the specified language is not yet supported by mathparse.
- mathparse.mathwords.word_groups_for_language(language_code: str) dict[str, dict[str, str]] [source]
Return the math word groups for a language code. The language_code should be an ISO 639-2 language code. https://www.loc.gov/standards/iso639-2/php/code_list.php
- mathparse.mathwords.words_for_language(language_code: str) set[str] [source]
Return the math words for a language code. The language_code should be an ISO 639-2 language code. https://www.loc.gov/standards/iso639-2/php/code_list.php
This module provides language-specific mathematical terms and utility functions.
Language Functions¶
- mathparse.mathwords.words_for_language(language_code: str) set[str] [source]¶
Return the math words for a language code. The language_code should be an ISO 639-2 language code. https://www.loc.gov/standards/iso639-2/php/code_list.php
Returns all mathematical words for a specific language.
Parameters:
language_code
(str): ISO 639-2 language code
Returns:
list
: All mathematical words for the language
Example:
from mathparse.mathwords import words_for_language
english_words = words_for_language('ENG')
# Returns: ['plus', 'minus', 'times', 'one', 'two', ...]
- mathparse.mathwords.word_groups_for_language(language_code: str) dict[str, dict[str, str]] [source]¶
Return the math word groups for a language code. The language_code should be an ISO 639-2 language code. https://www.loc.gov/standards/iso639-2/php/code_list.php
Returns organized groups of mathematical words (operators, numbers, scales) for a language.
Parameters:
language_code
(str): ISO 639-2 language code
Returns:
dict
: Dictionary containing word groups
Example:
from mathparse.mathwords import word_groups_for_language
groups = word_groups_for_language('ENG')
# Returns: {
# 'binary_operators': {'plus': '+', 'minus': '-', ...},
# 'numbers': {'one': 1, 'two': 2, ...},
# 'scales': {'hundred': 100, 'thousand': 1000, ...}
# }
Constants and Functions in Utils¶
Mathematical Constants in Utils¶
mathparse includes common mathematical constants:
- mathparse.mathwords.CONSTANTS¶
Dictionary of mathematical constants available in expressions.
Available constants:
pi
: 3.141693e
: 2.718281
Example:
result = mathparse.parse('pi * 2')
# Returns: 6.283386
Unary Functions in Utils¶
- mathparse.mathwords.UNARY_FUNCTIONS¶
Dictionary of available unary mathematical functions.
Available functions:
sqrt
: Square root functionlog
: Base-10 logarithm functionneg
: Negative (unary minus) function
Example:
result = mathparse.parse('sqrt 16')
# Returns: 4.0
result = mathparse.parse('log 100')
# Returns: 2.0
result = mathparse.parse('negative five plus ten', language='ENG')
# Returns: 5
Binary Operators¶
- mathparse.mathwords.BINARY_OPERATORS¶
Set of supported binary mathematical operators.
Includes:
{'^', '*', '/', '+', '-', '.'}
The decimal point (
'.'
) operator combines integer and fractional parts to create decimal numbers. For example, in the expression53 . 4
, the decimal operator combines 53 and 4 to produce 53.4.
Exceptions¶
- exception mathparse.mathparse.PostfixTokenEvaluationException[source]¶
Exception to be raised when an expression cannot be evaluated.
Raised when there’s an error evaluating postfix tokens.
- exception mathparse.mathwords.InvalidLanguageCodeException[source]¶
Exception to be raised when a language code is specified that is not a part of the ISO 639-2 standard, or if the specified language is not yet supported by mathparse.
Raised when an invalid or unsupported language code is provided.
Example:
from mathparse import mathparse
from mathparse.mathwords import InvalidLanguageCodeException
try:
result = mathparse.parse('five plus three', language='INVALID')
except InvalidLanguageCodeException as e:
print(f"Language error: {e}")
Language Codes¶
- mathparse.mathwords.LANGUAGE_CODES¶
List of supported ISO 639-2 language codes.
Currently supported:
['CHI', 'DUT', 'ENG', 'ESP', 'FRE', 'GER', 'GRE', 'ITA', 'MAR', 'POR', 'RUS', 'THA', 'UKR']