Parsing docstrings

This module contains functions and classes that parse docstrings.

AUTHORS:

  • David Roe (2012-03-27) – initial version, based on Robert Bradshaw’s code.

  • Jeroen Demeyer(2014-08-28) – much improved handling of tolerances using interval arithmetic (trac ticket #16889).

class sage.doctest.parsing.MarkedOutput

Bases: str

A subclass of string with context for whether another string matches it.

EXAMPLES:

sage: from sage.doctest.parsing import MarkedOutput
sage: s = MarkedOutput("abc")
sage: s.rel_tol
0
sage: s.update(rel_tol = .05)
u'abc'
sage: s.rel_tol
0.0500000000000000

sage: MarkedOutput(u"56 µs")
u'56 µs'
update(**kwds)

EXAMPLES:

sage: from sage.doctest.parsing import MarkedOutput
sage: s = MarkedOutput("0.0007401")
sage: s.update(abs_tol = .0000001)
u'0.0007401'
sage: s.rel_tol
0
sage: s.abs_tol
1.00000000000000e-7
class sage.doctest.parsing.OriginalSource(example)

Bases: object

Context swapping out the pre-parsed source with the original for better reporting.

EXAMPLES:

sage: from sage.doctest.sources import FileDocTestSource
sage: from sage.doctest.control import DocTestDefaults
sage: from sage.env import SAGE_SRC
sage: import os
sage: filename = os.path.join(SAGE_SRC,'sage','doctest','forker.py')
sage: FDS = FileDocTestSource(filename,DocTestDefaults())
sage: doctests, extras = FDS.create_doctests(globals())
sage: ex = doctests[0].examples[0]
sage: ex.sage_source
u'doctest_var = 42; doctest_var^2\n'
sage: ex.source
u'doctest_var = Integer(42); doctest_var**Integer(2)\n'
sage: from sage.doctest.parsing import OriginalSource
sage: with OriginalSource(ex):
....:     ex.source
u'doctest_var = 42; doctest_var^2\n'
sage.doctest.parsing.RIFtol(*args)

Create an element of the real interval field used for doctest tolerances.

It allows large numbers like 1e1000, it parses strings with spaces like RIF(" - 1 ") out of the box and it carries a lot of precision. The latter is useful for testing libraries using arbitrary precision but not guaranteed rounding such as PARI. We use 1044 bits of precision, which should be good to deal with tolerances on numbers computed with 1024 bits of precision.

The interval approach also means that we do not need to worry about rounding errors and it is also very natural to see a number with tolerance as an interval.

EXAMPLES:

sage: from sage.doctest.parsing import RIFtol
sage: RIFtol(-1, 1)
0.?
sage: RIFtol(" - 1 ")
-1
sage: RIFtol("1e1000")
1.00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000?e1000
class sage.doctest.parsing.SageDocTestParser(optional_tags=(), long=False)

Bases: doctest.DocTestParser

A version of the standard doctest parser which handles Sage’s custom options and tolerances in floating point arithmetic.

parse(string, *args)

A Sage specialization of doctest.DocTestParser.

INPUT:

  • string – the string to parse.

  • name – optional string giving the name identifying string, to be used in error messages.

OUTPUT:

  • A list consisting of strings and doctest.Example instances. There will be at least one string between successive examples (exactly one unless or long or optional tests are removed), and it will begin and end with a string.

EXAMPLES:

sage: from sage.doctest.parsing import SageDocTestParser
sage: DTP = SageDocTestParser(('sage','magma','guava'))
sage: example = 'Explanatory text::\n\n    sage: E = magma("EllipticCurve([1, 1, 1, -10, -10])") # optional: magma\n\nLater text'
sage: parsed = DTP.parse(example)
sage: parsed[0]
'Explanatory text::\n\n'
sage: parsed[1].sage_source
'E = magma("EllipticCurve([1, 1, 1, -10, -10])") # optional: magma\n'
sage: parsed[2]
'\nLater text'

If the doctest parser is not created to accept a given optional argument, the corresponding examples will just be removed:

sage: DTP2 = SageDocTestParser(('sage',))
sage: parsed2 = DTP2.parse(example)
sage: parsed2
['Explanatory text::\n\n', '\nLater text']

You can mark doctests as having a particular tolerance:

sage: example2 = 'sage: gamma(1.6) # tol 2.0e-11\n0.893515349287690'
sage: ex = DTP.parse(example2)[1]
sage: ex.sage_source
'gamma(1.6) # tol 2.0e-11\n'
sage: ex.want
u'0.893515349287690\n'
sage: type(ex.want)
<class 'sage.doctest.parsing.MarkedOutput'>
sage: ex.want.tol
2.000000000000000000...?e-11

You can use continuation lines:

sage: s = "sage: for i in range(4):\n....:     print(i)\n....:\n"
sage: ex = DTP2.parse(s)[1]
sage: ex.source
'for i in range(Integer(4)):\n    print(i)\n'

Sage currently accepts backslashes as indicating that the end of the current line should be joined to the next line. This feature allows for breaking large integers over multiple lines but is not standard for Python doctesting. It’s not guaranteed to persist, but works in Sage 5.5:

sage: n = 1234\
....:     5678
sage: print(n)
12345678
sage: type(n)
<class 'sage.rings.integer.Integer'>

It also works without the line continuation:

sage: m = 8765\
4321
sage: print(m)
87654321

Test that trac ticket #26575 is resolved:

sage: example3 = 'sage: Zp(5,4,print_mode="digits")(5)\n...00010'
sage: parsed3 = DTP.parse(example3)
sage: dte = parsed3[1]
sage: dte.sage_source
'Zp(5,4,print_mode="digits")(5)\n'
sage: dte.want
'...00010\n'
class sage.doctest.parsing.SageOutputChecker

Bases: doctest.OutputChecker

A modification of the doctest OutputChecker that can check relative and absolute tolerance of answers.

EXAMPLES:

sage: from sage.doctest.parsing import SageOutputChecker, MarkedOutput, SageDocTestParser
sage: import doctest
sage: optflag = doctest.NORMALIZE_WHITESPACE|doctest.ELLIPSIS
sage: DTP = SageDocTestParser(('sage','magma','guava'))
sage: OC = SageOutputChecker()
sage: example2 = 'sage: gamma(1.6) # tol 2.0e-11\n0.893515349287690'
sage: ex = DTP.parse(example2)[1]
sage: ex.sage_source
'gamma(1.6) # tol 2.0e-11\n'
sage: ex.want
u'0.893515349287690\n'
sage: type(ex.want)
<class 'sage.doctest.parsing.MarkedOutput'>
sage: ex.want.tol
2.000000000000000000...?e-11
sage: OC.check_output(ex.want, '0.893515349287690', optflag)
True
sage: OC.check_output(ex.want, '0.8935153492877', optflag)
True
sage: OC.check_output(ex.want, '0', optflag)
False
sage: OC.check_output(ex.want, 'x + 0.8935153492877', optflag)
False
add_tolerance(wantval, want)

Enlarge the real interval element wantval according to the tolerance options in want.

INPUT:

  • wantval – a real interval element

  • want – a MarkedOutput describing the tolerance

OUTPUT:

  • an interval element containing wantval

EXAMPLES:

sage: from sage.doctest.parsing import MarkedOutput, SageOutputChecker
sage: OC = SageOutputChecker()
sage: want_tol = MarkedOutput().update(tol=0.0001)
sage: want_abs = MarkedOutput().update(abs_tol=0.0001)
sage: want_rel = MarkedOutput().update(rel_tol=0.0001)
sage: OC.add_tolerance(RIF(pi.n(64)), want_tol).endpoints()
(3.14127849432443, 3.14190681285516)
sage: OC.add_tolerance(RIF(pi.n(64)), want_abs).endpoints()
(3.14149265358979, 3.14169265358980)
sage: OC.add_tolerance(RIF(pi.n(64)), want_rel).endpoints()
(3.14127849432443, 3.14190681285516)
sage: OC.add_tolerance(RIF(1e1000), want_tol)
1.000?e1000
sage: OC.add_tolerance(RIF(1e1000), want_abs)
1.000000000000000?e1000
sage: OC.add_tolerance(RIF(1e1000), want_rel)
1.000?e1000
sage: OC.add_tolerance(0, want_tol)
0.000?
sage: OC.add_tolerance(0, want_abs)
0.000?
sage: OC.add_tolerance(0, want_rel)
0
check_output(want, got, optionflags)

Checks to see if the output matches the desired output.

If want is a MarkedOutput instance, takes into account the desired tolerance.

INPUT:

OUTPUT:

  • boolean, whether got matches want up to the specified tolerance.

EXAMPLES:

sage: from sage.doctest.parsing import MarkedOutput, SageOutputChecker
sage: import doctest
sage: optflag = doctest.NORMALIZE_WHITESPACE|doctest.ELLIPSIS
sage: rndstr = MarkedOutput("I'm wrong!").update(random=True)
sage: tentol = MarkedOutput("10.0").update(tol=.1)
sage: tenabs = MarkedOutput("10.0").update(abs_tol=.1)
sage: tenrel = MarkedOutput("10.0").update(rel_tol=.1)
sage: zerotol = MarkedOutput("0.0").update(tol=.1)
sage: zeroabs = MarkedOutput("0.0").update(abs_tol=.1)
sage: zerorel = MarkedOutput("0.0").update(rel_tol=.1)
sage: zero = "0.0"
sage: nf = "9.5"
sage: ten = "10.05"
sage: eps = "-0.05"
sage: OC = SageOutputChecker()
sage: OC.check_output(rndstr,nf,optflag)
True

sage: OC.check_output(tentol,nf,optflag)
True
sage: OC.check_output(tentol,ten,optflag)
True
sage: OC.check_output(tentol,zero,optflag)
False

sage: OC.check_output(tenabs,nf,optflag)
False
sage: OC.check_output(tenabs,ten,optflag)
True
sage: OC.check_output(tenabs,zero,optflag)
False

sage: OC.check_output(tenrel,nf,optflag)
True
sage: OC.check_output(tenrel,ten,optflag)
True
sage: OC.check_output(tenrel,zero,optflag)
False

sage: OC.check_output(zerotol,zero,optflag)
True
sage: OC.check_output(zerotol,eps,optflag)
True
sage: OC.check_output(zerotol,ten,optflag)
False

sage: OC.check_output(zeroabs,zero,optflag)
True
sage: OC.check_output(zeroabs,eps,optflag)
True
sage: OC.check_output(zeroabs,ten,optflag)
False

sage: OC.check_output(zerorel,zero,optflag)
True
sage: OC.check_output(zerorel,eps,optflag)
False
sage: OC.check_output(zerorel,ten,optflag)
False

More explicit tolerance checks:

sage: _ = x  # rel tol 1e10
sage: raise RuntimeError   # rel tol 1e10
Traceback (most recent call last):
...
RuntimeError
sage: 1  # abs tol 2
-0.5
sage: print("0.9999")    # rel tol 1e-4
1.0
sage: print("1.00001")   # abs tol 1e-5
1.0
sage: 0  # rel tol 1
1

Spaces before numbers or between the sign and number are ignored:

sage: print("[ - 1, 2]")  # abs tol 1e-10
[-1,2]

Tolerance on Python 3 for string results with unicode prefix:

sage: a = u'Cyrano'; a
u'Cyrano'
sage: b = [u'Fermat', u'Euler']; b
[u'Fermat',  u'Euler']
sage: c = u'you'; c
u'you'

Also allowance for the difference in reprs of type instances (i.e. classes) between Python 2 and Python 3:

sage: int
<class 'int'>
sage: float
<class 'float'>
human_readable_escape_sequences(string)

Make ANSI escape sequences human readable.

EXAMPLES:

sage: print('This is \x1b[1mbold\x1b[0m text')
This is <CSI-1m>bold<CSI-0m> text
output_difference(example, got, optionflags)

Report on the differences between the desired result and what was actually obtained.

If want is a MarkedOutput instance, takes into account the desired tolerance.

INPUT:

OUTPUT:

  • a string, describing how got fails to match example.want

EXAMPLES:

sage: from sage.doctest.parsing import MarkedOutput, SageOutputChecker
sage: import doctest
sage: optflag = doctest.NORMALIZE_WHITESPACE|doctest.ELLIPSIS
sage: tentol = doctest.Example('',MarkedOutput("10.0\n").update(tol=.1))
sage: tenabs = doctest.Example('',MarkedOutput("10.0\n").update(abs_tol=.1))
sage: tenrel = doctest.Example('',MarkedOutput("10.0\n").update(rel_tol=.1))
sage: zerotol = doctest.Example('',MarkedOutput("0.0\n").update(tol=.1))
sage: zeroabs = doctest.Example('',MarkedOutput("0.0\n").update(abs_tol=.1))
sage: zerorel = doctest.Example('',MarkedOutput("0.0\n").update(rel_tol=.1))
sage: tlist = doctest.Example('',MarkedOutput("[10.0, 10.0, 10.0, 10.0, 10.0, 10.0]\n").update(abs_tol=0.987))
sage: zero = "0.0"
sage: nf = "9.5"
sage: ten = "10.05"
sage: eps = "-0.05"
sage: L = "[9.9, 8.7, 10.3, 11.2, 10.8, 10.0]"
sage: OC = SageOutputChecker()
sage: print(OC.output_difference(tenabs,nf,optflag))
Expected:
    10.0
Got:
    9.5
Tolerance exceeded:
    10.0 vs 9.5, tolerance 5e-1 > 1e-1

sage: print(OC.output_difference(tentol,zero,optflag))
Expected:
    10.0
Got:
    0.0
Tolerance exceeded:
    10.0 vs 0.0, tolerance 1e0 > 1e-1

sage: print(OC.output_difference(tentol,eps,optflag))
Expected:
    10.0
Got:
    -0.05
Tolerance exceeded:
    10.0 vs -0.05, tolerance 2e0 > 1e-1

sage: print(OC.output_difference(tlist,L,optflag))
Expected:
    [10.0, 10.0, 10.0, 10.0, 10.0, 10.0]
Got:
    [9.9, 8.7, 10.3, 11.2, 10.8, 10.0]
Tolerance exceeded in 2 of 6:
    10.0 vs 8.7, tolerance 2e0 > 9.87e-1
    10.0 vs 11.2, tolerance 2e0 > 9.87e-1
sage.doctest.parsing.get_source(example)

Return the source with the leading ‘sage: ‘ stripped off.

EXAMPLES:

sage: from sage.doctest.parsing import get_source
sage: from sage.doctest.sources import DictAsObject
sage: example = DictAsObject({})
sage: example.sage_source = "2 + 2"
sage: example.source = "sage: 2 + 2"
sage: get_source(example)
'2 + 2'
sage: example = DictAsObject({})
sage: example.source = "3 + 3"
sage: get_source(example)
'3 + 3'
sage.doctest.parsing.make_marked_output(s, D)

Auxiliary function for pickling.

EXAMPLES:

sage: from sage.doctest.parsing import make_marked_output
sage: s = make_marked_output("0.0007401", {'abs_tol':.0000001})
sage: s
u'0.0007401'
sage: s.abs_tol
1.00000000000000e-7
sage.doctest.parsing.normalize_bound_method_repr(s)

Normalize differences between Python 2 and 3 in how bound methods are represented.

On Python 2 bound methods are represented using the class name of the object the method was bound to, whereas on Python 3 they are represented with the fully-qualified name of the function that implements the method.

In the context of a doctest it’s almost impossible to convert accurately from the latter to the former or vice-versa, so we simplify the reprs of bound methods to just the bare method name.

This is slightly regressive since it means one can’t use the repr of a bound method to test whether some element is getting a method from the correct class (important sometimes in the cases of dynamic classes). However, such tests could be written could be written more explicitly to emphasize that they are testing such behavior.

EXAMPLES:

sage: from sage.doctest.parsing import normalize_bound_method_repr
sage: el = Semigroups().example().an_element()
sage: el
42
sage: el.is_idempotent
<bound method ....is_idempotent of 42>
sage: normalize_bound_method_repr(repr(el.is_idempotent))
'<bound method is_idempotent of 42>'

An example where the object repr contains whitespace:

sage: U = DisjointUnionEnumeratedSets(
....:          Family([1, 2, 3], Partitions), facade=False)
sage: U._element_constructor_
<bound method ...._element_constructor_default of Disjoint union of
Finite family {...}>
sage: normalize_bound_method_repr(repr(U._element_constructor_))
'<bound method _element_constructor_default of Disjoint union of Finite
family {...}>'
sage.doctest.parsing.normalize_long_repr(s)

Simple conversion from Python 2 representation of long ints (that is, integers with the L) suffix, to the Python 3 representation (same number, without the suffix, since Python 3 doesn’t have a distinct long type).

Note: This just uses a simple regular expression that can’t distinguish representations of long objects from strings containing a long repr.

EXAMPLES:

sage: from sage.doctest.parsing import normalize_long_repr
sage: normalize_long_repr('10L')
'10'
sage: normalize_long_repr('[10L, -10L, +10L, "ALL"]')
'[10, -10, +10, "ALL"]'
sage.doctest.parsing.normalize_type_repr(s)

Convert the repr of type objects (e.g. int, float) from their Python 2 representation to their Python 3 representation.

In Python 2, the repr of built-in types like int is like <type 'int'>, whereas user-defined pure Python classes are displayed as <class 'classname'>. On Python 3 this was normalized so that built-in types are represented the same as user-defined classes (e.g. <class 'int'>.

This simply normalizes all class/type reprs to the Python 3 convention for the sake of output checking.

EXAMPLES:

sage: from sage.doctest.parsing import normalize_type_repr
sage: s = "<type 'int'>"
sage: normalize_type_repr(s)
"<class 'int'>"
sage: normalize_type_repr(repr(float))
"<class 'float'>"

This can work on multi-line output as well:

sage: s = "The desired output was <class 'int'>\n"
sage: s += "The received output was <type 'int'>"
sage: print(normalize_type_repr(s))
The desired output was <class 'int'>
The received output was <class 'int'>

And should work when types are embedded in other nested expressions:

sage: normalize_type_repr(repr([Integer, float]))
"[<class 'sage.rings.integer.Integer'>, <class 'float'>]"
sage.doctest.parsing.parse_optional_tags(string)

Return a set consisting of the optional tags from the following set that occur in a comment on the first line of the input string.

  • ‘long time’

  • ‘not implemented’

  • ‘not tested’

  • ‘known bug’

  • ‘py2’

  • ‘py3’

  • ‘arb216’

  • ‘arb218’

  • ‘optional: PKG_NAME’ – the set will just contain ‘PKG_NAME’

EXAMPLES:

sage: from sage.doctest.parsing import parse_optional_tags
sage: parse_optional_tags("sage: magma('2 + 2')# optional: magma")
{'magma'}
sage: parse_optional_tags("sage: #optional -- mypkg")
{'mypkg'}
sage: parse_optional_tags("sage: print(1)  # parentheses are optional here")
set()
sage: parse_optional_tags("sage: print(1)  # optional")
{''}
sage: sorted(list(parse_optional_tags("sage: #optional -- foo bar, baz")))
['bar', 'foo']
sage: sorted(list(parse_optional_tags("    sage: factor(10^(10^10) + 1) # LoNg TiME, NoT TeSTED; OptioNAL -- P4cka9e")))
['long time', 'not tested', 'p4cka9e']
sage: parse_optional_tags("    sage: raise RuntimeError # known bug")
{'bug'}
sage: sorted(list(parse_optional_tags("    sage: determine_meaning_of_life() # long time, not implemented")))
['long time', 'not implemented']

We don’t parse inside strings:

sage: parse_optional_tags("    sage: print('  # long time')")
set()
sage: parse_optional_tags("    sage: print('  # long time')  # not tested")
{'not tested'}

UTF-8 works:

sage: parse_optional_tags("'ěščřžýáíéďĎ'")
set()
sage.doctest.parsing.parse_tolerance(source, want)

Return a version of want marked up with the tolerance tags specified in source.

INPUT:

  • source – a string, the source of a doctest

  • want – a string, the desired output of the doctest

OUTPUT:

  • want if there are no tolerance tags specified; a MarkedOutput version otherwise.

EXAMPLES:

sage: from sage.doctest.parsing import parse_tolerance
sage: marked = parse_tolerance("sage: s.update(abs_tol = .0000001)", "")
sage: type(marked)
<... 'str'>
sage: marked = parse_tolerance("sage: s.update(tol = 0.1); s.rel_tol # abs tol     0.01 ", "")
sage: marked.tol
0
sage: marked.rel_tol
0
sage: marked.abs_tol
0.010000000000000000000...?
sage.doctest.parsing.pre_hash(s)

Prepends a string with its length.

EXAMPLES:

sage: from sage.doctest.parsing import pre_hash
sage: pre_hash("abc")
'3:abc'
sage.doctest.parsing.reduce_hex(fingerprints)

Return a symmetric function of the arguments as hex strings.

The arguments should be 32 character strings consisting of hex digits: 0-9 and a-f.

EXAMPLES:

sage: from sage.doctest.parsing import reduce_hex
sage: reduce_hex(["abc", "12399aedf"])
'0000000000000000000000012399a463'
sage: reduce_hex(["12399aedf","abc"])
'0000000000000000000000012399a463'
sage.doctest.parsing.remove_unicode_u(string)

Given a string, try to remove all unicode u prefixes inside.

This will help to keep the same doctest results in Python2 and Python3. The input string is typically the documentation of a method or function. This string may contain some letters u that are unicode python2 prefixes. The aim is to remove all of these u and only them.

INPUT:

  • string – either unicode or bytes (if bytes, it will be converted to unicode assuming UTF-8)

OUTPUT: unicode string

EXAMPLES:

sage: from sage.doctest.parsing import remove_unicode_u as remu
sage: remu("u'you'")
u"'you'"
sage: remu('u')
u'u'
sage: remu("[u'am', 'stram', u'gram']")
u"['am', 'stram', 'gram']"
sage: remu('[u"am", "stram", u"gram"]')
u'["am", "stram", "gram"]'

This deals correctly with nested quotes:

sage: str = '''[u"Singular's stuff", u'good']'''
sage: print(remu(str))
["Singular's stuff", 'good']